10,000 Matching Annotations
  1. Mar 2025
    1. Reviewer #2 (Public review):

      Summary:

      The authors present a new model for animal pose estimation. The core feature they highlight is the model's stability compared to existing models in terms of keypoint drift. The authors test this model across a range of new and existing datasets. The authors also test the model with two mice in the same arena. For the single animal datasets the authors show a decrease in sudden jumps in keypoint detection and the number of undetected keypoints compared with DeepLabCut and SLEAP. Overall average accuracy, as measured by root mean squared error, generally shows generally similar but sometimes superior performance to DeepLabCut and better performance compared to SLEAP. The authors confusingly don't quantify the performance of pose estimation in the multi (two) animal case instead focusing on detecting individual identity. This multi-animal model is not compared with the model performance of the multi-animal mode of DeepLabCut or SLEAP.

      Strengths:

      The major strength of the paper is successfully demonstrating a model that is less likely to have incorrect large keypoint jumps compared to existing methods. As noted in the paper, this should lead to easier-to-interpret descriptions of pose and behavior to use in the context of a range of biological experimental workflows.

      Weaknesses:

      There are two main types of weaknesses in this paper. The first is a tendency to make unsubstantiated claims that suggest either model performance that is untested or misrepresents the presented data, or suggest excessively large gaps in current SOTA capabilities. One obvious example is in the abstract when the authors state ADPT "significantly outperforms the existing deep-learning methods, such as DeepLabCut, SLEAP, and DeepPoseKit." All tests in the rest of the paper, however, only discuss performance with DeepLabCut and SLEAP, not DeepPoseKit. At this point, there are many animal pose estimation models so it's fine they didn't compare against DeepPoseKit, but they shouldn't act like they did. Similar odd presentation of results are statements like "Our method exhibited an impressive prediction speed of 90{plus minus}4 frames per second (fps), faster than DeepLabCut (44{plus minus}2 fps) and equivalent to SLEAP (106{plus minus}4 fps)." Why is 90{plus minus}4 fps considered "equivalent to SLEAP (106{plus minus}4 fps)" and not slower? I agree they are similar but they are not the same. The paper's point of view of what is "equivalent" changes when describing how "On the single-fly dataset, ADPT excelled with an average mAP of 92.83%, surpassing both DeepLabCut and SLEAP (Figure 5B)" When one looks at Figure 5B, however, ADPT and DeepLabCut look identical. Beyond this, oddly only ADPT has uncertainty bars (no mention of what uncertainty is being quantified) and in fact, the bars overlap with the values corresponding to SLEAP and DeepPoseKit. In terms of making claims that seem to stretch the gaps in the current state of the field, the paper makes some seemingly odd and uncited statements like "Concerns about the safety of deep learning have largely limited the application of deep learning-based tools in behavioral analysis and slowed down the development of ethology" and "So far, deep learning pose estimation has not achieved the reliability of classical kinematic gait analysis" without specifying which classical gait analysis is being referred to. Certainly, existing tools like DeepLabCut and SLEAP are already widely cited and used for research.

      The other main weakness in the paper is the validation of the multi-animal pose estimation. The core point of the paper is pose estimation and anti-drift performance and yet there is no validation of either of these things relating to multi-animal video. All that is quantified is the ability to track individual identity with a relatively limited dataset of 10 mice IDs with only two in the same arena (and see note about train and validation splits below). While individual tracking is an important task, that literature is not engaged with (i.e. papers like Walter and Couzin, eLife, 2021: https://doi.org/10.7554/eLife.64000) and the results in this paper aren't novel compared to that field's state of the art. On the other hand, while multi-animal pose estimation is also an important problem the paper doesn't engage with those results either. The two methods already used for comparison in the paper, SLEAP and DeepPoseKit, already have multi-animal modes and multi-animal annotated datasets but none of that is tested or engaged with in the paper. The paper notes many existing approaches are two-step methods, but, for practitioners, the difference is not enough to warrant a lack of comparison. The authors state that "The evaluation of our social tracking capability was performed by visualizing the predicted video data (see supplement Videos 3 and 4)." While the authors report success maintaining mouse ID, when one actually watches the key points in the video of the two mice (only a single minute was used for validation) the pose estimation is relatively poor with tails rarely being detected and many pose issues when the mice get close to each other.

      Finally, particularly in the methods section, there were a number of places where what was actually done wasn't clear. For example in describing the network architecture, the authors say "Subsequently, network separately process these features in three branches, compute features at scale of one-fourth, one-eight and one-sixteenth, and generate one-eight scale features using convolution layer or deconvolution layer." Does only the one-eight branch have deconvolution or do the other branches also? Similarly, for the speed test, the authors say "Here we evaluate the inference speed of ADPT. We compared it with DeepLabCut and SLEAP on mouse videos at 1288 x 964 resolution", but in the methods section they say "The image inputs of ADPT were resized to a size that can be trained on the computer. For mouse images, it was reduced to half of the original size." Were different image sizes used for training and validation? Or Did ADPT not use 1288 x 964 resolution images as input which would obviously have major implications for the speed comparison? Similarly, for the individual ID experiments, the authors say "In this experiment, we used videos featuring different identified mice, allocating 80% of the data for model training and the remaining 20% for accuracy validation." Were frames from each video randomly assigned to the training or validation sets? Frames from the same video are very correlated (two frames could be just 1/30th of a second different from each other), and so if training and validation frames are interspersed with each other validation performance doesn't indicate much about performance on more realistic use cases (i.e. using models trained during the first part of an experiment to maintain ids throughout the rest of it.)

      Editors' note: None of the original reviewers responded to our request to re-review the manuscript. The attached assessment statement is the editor's best attempt at assessing the extent to which the authors addressed the outstanding concerns from the previous round of revisions.

    2. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study introduces a useful deep learning-based algorithm that tracks animal postures with reduced drift by incorporating transformers for more robust keypoint detection. The efficacy of this new algorithm for single-animal pose estimation was demonstrated through comparisons with two popular algorithms. However, the analysis is incomplete and would benefit from comparisons with other state-of-the-art methods and consideration of multi-animal tracking.

      First, we would like to express our gratitude to the eLife editors and reviewers for their thorough evaluation of our manuscript. ADPT aims to improve the accuracy of body point detection and tracking in animal behavior, facilitating more refined behavioral analyses. The insights provided by the reviewers have greatly enhanced the quality of our work, and we have addressed their comments point-by-point.

      In this revision, we have included additional quantitative comparisons of multi-animal tracking capabilities between ADPT and other state-of-the-art methods. Specifically, we have added evaluations involving homecage social mice and marmosets to comprehensively showcase ADPT’s advantages from various perspectives. This additional analysis will help readers better understand how ADPT effectively overcomes point drift and expands its applicability in the field.

      Reviewer #1:

      In this paper, the authors introduce a new deep learning-based algorithm for tracking animal poses, especially in minimizing drift effects. The algorithm's performance was validated by comparing it with two other popular algorithms, DeepLabCut and LEAP.The accessibility of this tool for biological research is not clearly addressed, despite its potential usefulness. Researchers in biology often have limited expertise in deep learning training, deployment, and prediction. A detailed, step-by-step user guide is crucial, especially for applications in biological studies.

      We appreciate the reviewers' acknowledgment of our work. While ADPT demonstrates superior performance compared to DeepLabCut and SLEAP, we recognize that the absence of a user-friendly interface may hinder its broader application, particularly for users with a background solely in biology. In this revision, we have enhanced the command-line version of the user tutorial to provide a clear, step-by-step guide. Additionally, we have developed a simple graphical user interface (GUI) to further support users who may not have expertise in deep learning, thereby making ADPT more accessible for biological research.

      The proposed algorithm focuses on tracking and is compared with DLC and LEAP, which are more adept at detection rather than tracking.

      In the field of animal pose estimation, the distinction between detection and tracking is often blurred. For instance, the title of the paper "SLEAP: A deep learning system for multi-animal pose tracking" refers to "tracking," while "detection" is characterized as "pose estimation" in the body text. Similarly, "Multi-animal pose estimation, identification, and tracking with DeepLabCut" uses "tracking" in the title, yet "detection" is also mentioned in the pose estimation section. We acknowledge that referencing these articles may have contributed to potential confusion.

      To address this, we have clarified the distinction between "tracking" and "detection" Results section under " Anti-drift pose tracker." (see lines 118-119). In this paper, we now explicitly use “track” to refer to the tracking of all body points or poses of an individual, and “detect” for specific keypoints.

      Reviewer #1 recommendations:

      (1) DLC and LEAP are mainly good in detection, not tracking. The authors should compare their ADPT algorithm with idtracker.ai, ByteTrack, and other advanced tracking algorithms, including recent track-anything algorithms.

      (2) DeepPoseKit is outdated and no longer maintained; a comparison with the T-REX algorithm would be more appropriate.

      We appreciate the reviewer's suggestion for a more comprehensive comparison and acknowledge the importance of including these advanced tracking algorithms. However, we have not yet found suitable publicly available datasets for such comparative testing. We appreciate this insight and will consider incorporating T-REX into future comparisons.

      (3) The authors primarily compared their performance using custom data. A systematic comparison with published data, such as the dataset reported in the paper "Multi-animal pose estimation, identification, and tracking with DeepLabCut," is necessary. A detailed comparison of the performances between ADPT and DLC is required.

      In the previous version of our manuscript, we included the SLEAP single-fly public dataset and the OMS_dataset from OpenMonkeyStudio for performance comparisons. We recognize that these datasets were not comprehensive. In this revision, we have added the marmoset dataset from "Multi-animal pose estimation, identification, and tracking with DeepLabCut" and a customized homecage social mice dataset to enhance our comparative analysis of multi-animal pose estimation performance. Our comprehensive comparison reveals that ADPT outperforms both DLC and SLEAP, as discussed in the Results section under "ADPT can be adapted for end-to-end pose estimation and identification of freely social animals.". (Figure 1, see lines 303-332)

      (4) Given the focus on biological studies, an easy-to-use interface and introduction are essential.

      In this revision, we have not only developed a GUI for ADPT but also included a more detailed tutorial. This can be accessed at https://github.com/tangguoling/ADPT-TOOLBOX

      Reviewer #2:

      The authors present a new model for animal pose estimation. The core feature they highlight is the model's stability compared to existing models in terms of keypoint drift. The authors test this model across a range of new and existing datasets. The authors also test the model with two mice in the same arena. For the single animal datasets the authors show a decrease in sudden jumps in keypoint detection and the number of undetected keypoints compared with DeepLabCut and SLEAP. Overall average accuracy, as measured by root mean squared error, generally shows similar but sometimes superior performance to DeepLabCut and better performance compared to SLEAP. The authors confusingly don't quantify the performance of pose estimation in the multi (two) animal case instead focusing on detecting individual identity. This multi-animal model is not compared with the model performance of the multi-animal mode of DeepLabCut or SLEAP.

      We appreciate the reviewer's thoughtful assessment of our manuscript. Our study focuses on addressing the issue of keypoint drift prevalent in animal pose estimation methods like DeepLabCut and SLEAP. During the model design process, we discovered that the structure of our model also enhances performance in identifying multiple animals. Consequently, we included some results related to multi-animal identity recognition in our manuscript.

      In recent developments, we are working to broaden the applicability of ADPT for multi-animal pose estimation and identity recognition. Given that our manuscript emphasizes pose estimation, we have added a comparison of anti-drift performance in multi-animal scenarios in this revision. This quantifies ADPT's capability to mitigate drift in multi-animal pose estimation.

      Using our custom Homecage social mice dataset, we compared ADPT with DeepLabCut and SLEAP. The results indicate that ADPT achieves more accurate anti-drift pose estimation for two mice, with superior keypoint detection accuracy. Furthermore, we also evaluated pose estimation accuracy on the publicly available marmoset dataset, where ADPT outperformed both DeepLabCut and SLEAP. These findings are discussed in the Results section under "ADPT can be adapted for end-to-end pose estimation and identification of freely social animals."

      The first is a tendency to make unsubstantiated claims that suggest either model performance that is untested or misrepresents the presented data, or suggest excessively large gaps in current SOTA capabilities. One obvious example is in the abstract when the authors state ADPT "significantly outperforms the existing deep-learning methods, such as DeepLabCut, SLEAP, and DeepPoseKit." All tests in the rest of the paper, however, only discuss performance with DeepLabCut and SLEAP, not DeepPoseKit. At this point, there are many animal pose estimation models so it's fine they didn't compare against DeepPoseKit, but they shouldn't act like they did.

      We appreciate the reviewer's feedback regarding unsubstantiated claims in our manuscript. Upon careful review, we acknowledge that our previous revisions inadvertently included statements that may misrepresent our model's performance. In particular, we have revised the abstract to eliminate the mention of DeepPoseKit, as our comparisons focused exclusively on DeepLabCut and SLEAP.

      In addition to this correction, we have thoroughly reviewed the entire manuscript to address other instances of ambiguity and ensure that our claims are well-supported by the data presented. Thank you for bringing this to our attention; we are committed to maintaining the integrity of our claims throughout the paper.

      In terms of making claims that seem to stretch the gaps in the current state of the field, the paper makes some seemingly odd and uncited statements like "Concerns about the safety of deep learning have largely limited the application of deep learning-based tools in behavioral analysis and slowed down the development of ethology" and "So far, deep learning pose estimation has not achieved the reliability of classical kinematic gait analysis" without specifying which classical gait analysis is being referred to. Certainly, existing tools like DeepLabCut and SLEAP are already widely cited and used for research.

      In this revision, we have carefully reviewed the entire manuscript and addressed the instances of seemingly odd and unsubstantiated claims. Specifically, we have revised the statements "largely limited" to "limited" to ensure accuracy and clarity. Additionally, we thoroughly reviewed the citation list to ensure proper attribution, incorporating references such as "A deep learning-based toolbox for Automated Limb Motion Analysis (ALMA) in murine models of neurological disorders" to better substantiate our claims and provide a clearer context.

      We have also added an additional section to comprehensively discuss the applications of widely-used tools like DeepLabCut and SLEAP in behavioral research. This new section elaborates on the challenges and limitations researchers encounter when applying these methods, highlighting both their significant contributions and the areas where improvements are still needed.

      The other main weakness in the paper is the validation of the multi-animal pose estimation. The core point of the paper is pose estimation and anti-drift performance and yet there is no validation of either of these things relating to multi-animal video. All that is quantified is the ability to track individual identity with a relatively limited dataset of 10 mice IDs with only two in the same arena (and see note about train and validation splits below). While individual tracking is an important task, that literature is not engaged with (i.e. papers like Walter and Couzin, eLife, 2021: https://doi.org/10.7554/eLife.64000) and the results in this paper aren't novel compared to that field's state of the art. On the other hand, while multi-animal pose estimation is also an important problem the paper doesn't engage with those results either. The two methods already used for comparison in the paper, SLEAP and DeepPoseKit, already have multi-animal models and multi-animal annotated datasets but none of that is tested or engaged with in the paper. The paper notes many existing approaches are two-step methods, but, for practitioners, the difference is not enough to warrant a lack of comparison.

      We appreciate the reviewer's insights regarding the validation of multi-animal pose estimation in our paper. While our primary focus has been on pose estimation and anti-drift performance, we recognize the importance of validating these aspects within the context of multi-animal videos.

      In this revision, we have included a comparison of ADPT's anti-drift performance in multi-animal pose estimation, utilizing our custom Homecage social mouse dataset (Figure 1A). Our findings indicate that ADPT achieves more accurate pose estimation for two mice while significantly reducing keypoint drift, outperforming both DeepLabCut and SLEAP. (see lines 311-322). We trained each model three times, and this figure presents the results from one of those training sessions. We calculated the average RMSE between predictions and manual labels, demonstrating that ADPT achieved an average RMSE of 15.8 ± 0.59 pixels, while DeepLabCut (DLC) and SLEAP recorded RMSEs of 113.19 ± 42.75 pixels and 94.76 ± 1.95 pixels, respectively (Figure 1C). ADPT achieved an accuracy of 6.35 ± 0.14 pixels based on the DLC evaluation metric across all body parts of the mice, while DLC reached 7.49 ± 0.2 pixels (Figure 1D). ADPT achieved 8.33 ± 0.19 pixels using the SLEAP evaluation Metric across all body parts of the mice, compared to SLEAP’s 9.82 ± 0.57 pixels (Figure 1E).

      Furthermore, we have conducted pose estimation accuracy evaluations on the publicly available marmoset dataset from DeepLabCut, where ADPT also demonstrated superior performance compared to DeepLabCut and SLEAP. These results can be found in the "ADPT can be adapted for end-to-end pose estimation and identification of freely social animals" section of the Results. (see lines 323-329)

      We acknowledge the existing literature on multi-animal tracking, such as the work by Walter and Couzin (2021). While individual tracking is crucial, our primary focus lies in the effective tracking of animal poses and minimizing drift during this process. This dual emphasis on pose tracking and anti-drift performance distinguishes our work and aligns with ongoing advancements in the field. Engaging with relevant literature, highlights the importance of contextualizing our results within the broader tracking literature, demonstrating that while our findings may overlap with existing methods, the unique focus on improving tracking stability and reducing drift presents valuable contributions to the field. Thank you for your valuable feedback, which has helped us improve the robustness of our manuscript.

      The authors state that "The evaluation of our social tracking capability was performed by visualizing the predicted video data (see supplement Videos 3 and 4)." While the authors report success maintaining mouse ID, when one actually watches the key points in the video of the two mice (only a single minute was used for validation) the pose estimation is relatively poor with tails rarely being detected and many pose issues when the mice get close to each other.

      We acknowledge that there are indeed challenges in pose estimation, particularly when the two mice get close to each other, leading to tracking failures and infrequent detection of tails in the predicted videos. The reasons for these issues can be summarized as follows:

      Lack of Training Data from Real Social Scenarios: The training data used for the social tracking assessment were primarily derived from the Mix-up Social Animal Dataset, which does not fully capture the complexities of real social interactions. In future work, we plan to incorporate a blend of real social data and the Mix-up data for model training. Specifically, we aim to annotate images where two animals are in close proximity or interacting to enhance the model's understanding of genuine social behaviors.

      Challenges in Tail Tracking in Social Contexts: Tracking the tails of mice in social situations remains a significant challenge. To validate this, we have added an assessment of tracking performance in real social settings using homecage data. Our findings indicate that using annotated data from real environments significantly improves tail tracking accuracy, as demonstrated in the supplementary video.

      We appreciate your feedback, which highlights critical areas for improvement in our model.

      Finally, particularly in the methods section, there were a number of places where what was actually done wasn't clear.

      We have carefully reviewed and revised the corresponding parts to clarify the previously incomprehensible statements. Thank you for your valuable feedback, which has helped enhance the clarity of our methods.

      For example in describing the network architecture, the authors say "Subsequently, network separately process these features in three branches, compute features at scale of one-fourth, one-eight and one-sixteenth, and generate one-eight scale features using convolution layer or deconvolution layer." Does only the one-eight branch have deconvolution or do the other branches also?

      We apologize for the confusion this has caused. Upon reviewing our manuscript, we identified an error in the diagram. In the revised version, we have clarified that the model samples feature maps at multiple resolutions and ultimately integrates them at the 1/8 resolution for feature fusion. Specifically, the 1/4 feature map from ResNet50's stack 2 is processed through max-pooling and convolution to generate a 1/8 feature map. Additionally, the 1/4 feature map from ResNet50's stack 2 is also transformed into a 1/8 feature map using a convolution operation with a stride of 2. Finally, both the input and output of the transformer are at the 1/16 resolution, which can be trained on a 2080Ti GPU. The 1/16 feature map is then upsampled to produce the final 1/8 feature map. We have updated the manuscript to reflect these changes, and we also modified the model architecture diagram for better clarity.

      Similarly, for the speed test, the authors say "Here we evaluate the inference speed of ADPT. We compared it with DeepLabCut and SLEAP on mouse videos at 1288 x 964 resolution", but in the methods section they say "The image inputs of ADPT were resized to a size that can be trained on the computer. For mouse images, it was reduced to half of the original size." Were different image sizes used for training and validation? Or Did ADPT not use 1288 x 964 resolution images as input which would obviously have major implications for the speed comparison?

      For our inference speed evaluation, all models, including ADPT, used images with a resolution of 1288 x 964. In ADPT's processing pipeline, the first layer is a resizing layer designed to compress the images to a scale determined by the global scale parameter. For the mouse images, we set the global scale to 0.5, allowing our GPU to handle the data at that resolution during transformer training.

      We recorded the time taken by ADPT to process the entire 15-minute mouse video, which included the time taken for the resizing operation, and subsequently calculated the frames per second (FPS). We have clarified this process in the manuscript, particularly in the "Network Architecture" section, where we specify: "Initially, ADPT will resize the images to a390 scale (a hyperparameter, consistent with the global scale in the DLC configuration)."

      Similarly, for the individual ID experiments, the authors say "In this experiment, we used videos featuring different identified mice, allocating 80% of the data for model training and the remaining 20% for accuracy validation." Were frames from each video randomly assigned to the training or validation sets? Frames from the same video are very correlated (two frames could be just 1/30th of a second different from each other), and so if training and validation frames are interspersed with each other validation performance doesn't indicate much about performance on more realistic use cases (i.e. using models trained during the first part of an experiment to maintain ids throughout the rest of it.)

      In our study, we actually utilized the first 80% of frames from each video for model training and the remaining 20% for testing the model's ID tracking accuracy. We have revised the relevant description in the manuscript to clarify this process. The updated description can be found in the "Datasets" section under "Mouse Videos of Different Individuals."

    1. eLife Assessment

      This useful manuscript reports on the crystal structures of two glycosaminoglycan (GAG) lyases from the PL35 family, along with in vitro enzyme activity assays and comprehensive structure-guided mutagenesis. The authors have addressed key concerns by incorporating additional docking analyses, validating the role of His188 in alginate degradation, and providing ICP-MS data to examine Mn²⁺ binding. While these improvements enhance the study, the study is incomplete due to the lack of enzyme-substrate complex structures and reliance on modeling which still limit mechanistic insight. Nonetheless, the revised manuscript presents a more complete analysis that will be of interest to specialists in carbohydrate-active enzymes.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to uncover molecular and structural details underlying the broad substrate specificity of glycosaminoglycan lyases belonging to a specific family (PL35). They determined the crystal structures of two such enzymes, conducted in vitro enzyme activity assays, and a thorough structure-guided mutagenesis campaign to interrogate the role of specific residues. They made progress towards achieving their aims and I appreciate the attempt of the authors to address my initial comments on the paper.

      Impact on the field:

      I expect this work will have limited impact on the field, although it does stand on its own as a solid piece of structure-function analysis.

      Strengths:

      The major strengths of the study were the combination of structure and enzyme activity assays, comprehensive structural analysis, as well as a thorough structure-guided mutagenesis campaign.

      Weaknesses:

      (Before revision) -the authors claim to have done a ICP-MS experiment to show Mn2+ binds to their enzyme, but did not present the data. The authors could have used the anomalous scattering properties of Mn2+ at the synchrotron to determine the presence and location of this cation (i.e. fluorescence spectra, and/or anomalous data collection at the Mn2+ absorption peak).<br /> *comment after revision: I appreciate that the authors included this data now, and it looks fine.

      (Before revision) -the authors have an over-reliance on molecular docking for understanding the position of substrates bound to the enzyme. The docking analysis performed was cursory at best; Autodock Vina is a fine program but more rigorous software could have been chosen, as well we molecular dynamics simulations. As well the authors do not use any substrate/product-bound structures from the broader PL enzyme family to guide the placement of the substrates in the GAGases, and interpret the molecular docking models.<br /> *comment after revision: the authors used another docking program, which is fine, but did not do any MD analysis or comment on why not. Also maybe it is just me but I still do not see a figure explicitly showing an overlay/superposition of the docking results with crystal structures of similar enzymes with similar ligands. The authors do have a statement in this regard but I believe a figure (e.g. an additional panel on S2) would be very helpful to the reader.

      (Before revision)-the conclusion that the structures of GAGase II and VII are most similar to the structures of alginate lyases (Table 2 data), and the authors' reliance on DALI, are both questioned. DALI uses a global alignment algorithm, which when used for multi-domain enzymes such as these tends to result in sub-optimal alignment of active site residues, particularly if the active site is formed between the two domains as is the case here. The authors should evaluate local alignment methods focused on optimization of the superposition of a single domain; these methods may result in a more appropriate alignment of the active site residues, and different alignment statistics. This may influence the overall conclusion of the evolutionary history of these PL35 enzymes.<br /> *comment after revision: I'm not sure the authors understood my suggestion as the reply reiterates the original conclusions. I suggest local structural alignment of *only* the toroid and antiparallel β-sheet domains, not global alignment of both domains, as this would improve the accuracy of the structural similarity conclusions.

      (Before revision)-the data on the GAGase III residue His188 is not well interpreted; substitution of this residue clearly impacts HA and HS hydrolysis as well. The data on the impact on alginate hydrolysis is weak, which could be due to the fact that the WT enzyme has poor activity against alginate to start with.<br /> *comment after revision: I appreciate that the authors used higher amounts of H188A variants and still do not see activity on alginate, which strengthens the conclusions regarding this substrate. However this variant also has decreased activity against HS (Figure 5C) and thus H188 appears to be important for more substrates than just alginate. The discussion section should be updated accordingly.

      (Before revision)-the authors did not use the words "homology", "homologous", or "homolog" correctly (these terms mean the subjects have a known evolutionary relationship, which may or may not be known in the contexts the authors used these targets); the words "similarity" and "similar" are recommended to be used instead.<br /> *comment after revision: I thank the authors for addressing this.

      (Before revision)-the authors discuss a "shorter" cavity in GAGases, which does not make sense, and is not supported by any figure or analysis. I recommend a figure with a surface representation of the various enzymes of interest, with dimensions of the cavity labeled (as a supplemental figure). The authors also do not specifically define what subsites are in the context of this family of enzymes, nor do they specifically label or indicate the location of the subsites on the figures of the GAGase II and IV enzyme structures.<br /> *comment after revision: I thank the authors for improving their figures and text description on this point.

    3. Reviewer #3 (Public review):

      Summary:

      The authors characterized previous substrate specificity of several polysaccharide lyases from family PL35 (CAzy) and discovered their unusually broad substrate specificity, being able to degrade three types of GAGs belonging to HA, CS, and HS classes.<br /> In this study they determined the 3D structures of two lyases from this family and identified several residues essential for substrate degradation. Comparison with lyases from other PL families but having the same fold allowed them to propose an Asn, Tyr and His as essential for catalysis. One of the characterized lyases can also degrade alginate and they established a specific His residue as necessary for activity toward this substrate but not sufficient by itself.<br /> Attempts to obtain crystals with substrate or products were unsuccessful, therefore the authors resorted to modeling substrate into the determined structures. The obtained models led them to propose a catalytic mechanism, that generally reflects previously proposed mechanism for lyases with this fold.

      Unfortunately, they have no definitive explanation for a broad specificity for the PL35 lyases but suggest that it is related to a shorter substrate binding cleft with a large open space on the nonreducing end of the substrate.

      Strengths:

      The determination of 3D structure of two PL35 lyases allows comparing them to other lyases with similar fold. The structures show a shorter substrate binding cleft that might be the reason for broader substrate specificity. Essential roles of several residues in catalysis and/or substrate binding were established by mutagenesis.

      Weaknesses:

      The main weakness is the lack of the structures of an enzyme-substrate/product complex. While the determined structures confirm the predicted two domain fold with a helical toroid domain and a double beta-sheet domain, the explanation for the broad specificity is lacking, except for suggestion that it has to do with a shorter substrate binding cleft. The enzymatic mechanism is hypothesized based on models rather than supported by experimentally determined structure of the complex.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to uncover molecular and structural details underlying the broad substrate specificity of glycosaminoglycan lyases belonging to a specific family (PL35). They determined the crystal structures of two such enzymes, conducted in vitro enzyme activity assays, and a thorough structure-guided mutagenesis campaign to interrogate the role of specific residues. They made progress towards achieving their aims but I see significant holes in data that need to be determined and in the authors' analyses.

      Impact on the field:

      I expect this work will have a limited impact on the field, although, with additional experimental work and better analysis, this paper will be able to stand on its own as a solid piece of structure-function analysis.

      Strengths:

      The major strengths of the study were the combination of structure and enzyme activity assays, comprehensive structural analysis, as well as a thorough structure-guided mutagenesis campaign.

      Weaknesses:

      There were several weaknesses, particularly:

      (1) The authors claim to have done an ICP-MS experiment to show Mn2+ binds to their enzyme but did not present the data. The authors could have used the anomalous scattering properties of Mn2+ at the synchrotron to determine the presence and location of this cation (i.e. fluorescence spectra, and/or anomalous data collection at the Mn2+ absorption peak).

      Thank you for your kind comment and suggestion. Many studies utilized ICP-MS for the detection of metal ions within proteins (doi: 10.1016/j.jbc.2023.103047; doi: 10.1074/jbc.RA119.011790), so we utilized this method to determine the type of atoms within GAGases. In the revised manuscript, the data of ICP-MS experiment has been presented in “Supplemental Table S1”

      (2) The authors have an over-reliance on molecular docking for understanding the position of substrates bound to the enzyme. The docking analysis performed was cursory at best; Autodock Vina is a fine program but more rigorous software could have been chosen, as well we molecular dynamics simulations. As well the authors do not use any substrate/product-bound structures from the broader PL enzyme family to guide the placement of the substrates in the GAGases, and interpret the molecular docking models.

      Thank you for your kind comments. The interaction between the enzyme and ligand should be confirmed by resolving the structure of enzyme-ligand complex. Unfortunately, we tried to prepare the co-crystals of GAGases with various oligosaccharide substrates but ultimately failed. Thus, we tried to use docking to explain the catalytic mechanism of polysaccharide lyases using Autodock Vina although this method may be questionable. In the revised manuscript, we predicted the substrate binding site of GAGase II using Caver Web 1.2 and performed molecular docking near the substrate binding site simultaneously using Molecular Operating Environment (MOE) to verify the accuracy of the docking results (Figure 6, Supplemental Figure S4). In addition, a series of enzyme-substrate complex structures of identified PL family enzymes with structural similarities to the GAGases are showed in Supplemental Figure S2, and the positions of the catalytic cavities and the substrate binding modes are similar to those of the molecular docking results, which may also corroborate the referability of our molecular docking results in another aspect.

      (3) The conclusion that the structures of GAGase II and VII are most similar to the structures of alginate lyases (Table 2 data), and the authors' reliance on DALI, are both questioned. DALI uses a global alignment algorithm, which when used for multi-domain enzymes such as these tends to result in sub-optimal alignment of active site residues, particularly if the active site is formed between the two domains as is the case here. The authors should evaluate local alignment methods focused on the optimization of the superposition of a single domain; these methods may result in a more appropriate alignment of the active site residues and different alignment statistics. This may influence the overall conclusion of the evolutionary history of these PL35 enzymes.

      Thank you for your kind question. As your suggestion, multiple structural alignment assays were carried out for the (α/α)<sub>n</sub> toroid and the antiparallel β-sheet domain, respectively, based on the structures of GAGs/alginate lyases from PL5, PL8, PL12, PL15, PL17, PL21, PL23, PL36, PL38 and PL39 families. The results showed that the overall structure of GAGases is more similarity to that of PL15, PL17 and PL39 family alginate lyases, which have an (α/α)<sub>6</sub> toroid and an antiparallel β-sheet domain (Table 3). In terms of the toroid and antiparallel β-sheet domains, most of them have an (α/α)<sub>6</sub> toroid and an antiparallel β-sheet as shown in Table 3. We also noticed that GAGases possess such a (α/α)<sub>6</sub> toroid structure rather than a (α/α)<sub>7</sub> toroid structure, and revised the relevant statement in the manuscript.

      (4) The data on the GAGase III residue His188 is not well interpreted; substitution of this residue clearly impacts HA and HS hydrolysis as well. The data on the impact on alginate hydrolysis is weak, which could be due to the fact that the WT enzyme has poor activity against alginate to start with.

      Thank you very much for your helpful comments and questions. To verify your suggestion that the weak impact of alginate hydrolysis could be due to poor activity of wild type GAGase III, we degraded alginate using different enzyme concentrations (3 to 30 μg) and analyzed the degradation products. The results showed that the alginate-degrading activity of GAGase III-H188A and GAGase III-H188N was abolished, even at a quite high ratio of the mutated enzyme to substrate such as 30 μg enzyme to 30 μg substrate (Supplemental Figure S3A), while their GAG-degrading activity was only partially affected, indicating that this residue plays a more important role for the digestion of alginate than other substrates. Unfortunately, we were unable to confer the ability to GAGase III through the mutation of N191H in GAGase II. Therefore, we suggest that His<sup>188</sup> play a key role in the specificity of alginate degradation by GAGase III, but that other determinants also contribute to this process. We will try more methods to obtain the structure of enzyme-substrate co-crystals and explain its substrate-selective mechanism in future studies.

      (5) The authors did not use the words "homology", "homologous", or "homolog" correctly (these terms mean the subjects have a known evolutionary relationship, which may or may not be known in the contexts the authors used these targets); the words "similarity" and "similar" are recommended to be used instead.

      Thank you for your helpful suggestions. We have revised the relevant part of the description in the manuscript.

      (6) The authors discuss a "shorter" cavity in GAGases, which does not make sense and is not supported by any figure or analysis. I recommend a figure with a surface representation of the various enzymes of interest, with dimensions of the cavity labeled (as a supplemental figure). The authors also do not specifically define what subsites are in the context of this family of enzymes, nor do they specifically label or indicate the location of the subsites on the figures of the GAGase II and IV enzyme structures.

      Thank you for your helpful suggestions. Figures (Supplemental Figure S2) with surface representations of the GAGase II and some structurally similar GAGs/alginate lyases with the dimensions of the cavity labeled, were added to the supplementary data as you suggested. Considering the correlation between enzyme specificity and substrate binding sites, we speculated that a shorter substrate binding cavity might allow the enzyme to accommodate a wider variety of substrates, resulting in a smaller restriction of the catalytic cavity to substrate binding, although this speculation needs to be verified by the resolution of the crystal structure of the enzyme-substrate complexes.

      Reviewer #2 (Public review):

      Summary:

      Wei et al. present the X-ray crystallographic structures of two PL35 family glycosaminoglycan (GAG) lyases that display a broad substrate specificity. The structural data show that there is a high degree of structural homology between these enzymes and GAGases that have previously been structurally characterized. Central to this are the N-terminal (α/α)7 toroid domain and the C-terminal two-layered β-sheet domain. Structural alignment of these novel PL35 lyases with previously deposited structures shows a highly conserved triplet of residues at the heart of the active sites. Docking studies identified potentially important residues for substrate binding and turnover, and subsequent site-directed mutagenesis paired with enzymatic assays confirmed the importance of many of these residues. A third PL35 GAGase that is able to turn over alginate was not crystallized, but a predicted model showed a conserved active site Asn was mutated to a His, which could potentially explain its ability to act on alginate. Mutation of the His into either Ala or Asn abrogated its activity on alginate, providing supporting evidence for the importance of the His. Finally, a catalytic mechanism is proposed for the activity of the PL35 lyases. Overall, the authors used an appropriate set of methods to investigate their claims, and the data largely support their conclusions. These results will likely provide a platform for further studies into the broad substrate specificity of PL35 lyases, as well as for studies into the evolutionary origins of these unique enzymes

      Strengths:

      The crystallographic data are of very high quality, and the use of modern structural prediction tools to allow for comparison of GAGase III to GAGase II/GAGase VII was nice to see. The authors were comprehensive in their comparison of the PL35 lyases to those in other families. The use of molecular docking to identify key residues and the use of site-directed mutagenesis to investigate substrate specificity was good, especially going the extra distance to mutate the conserved Asn to His in GAGase II and GAGase VII.

      Weaknesses:

      The structural models simply are not complete. A cursory look at the electron density and the models show that there are many positive density peaks that have not had anything modelled into them. The electron density also does not support the placement of a Mn2+ in the model. The authors indicate that ICP-MS was done to identify the metal, but no ICP-MS data is presented in the main text or supplementary. I believe the authors put too much emphasis on the possibility of GAGase III representing an evolutionary intermediate between GAG lyases and alginate lyases based on a single Asn to His mutation in the active site, and I don't believe that enough time was spent discussing how this "more open and shorter" catalytic cavity would necessarily mean that the enzyme could accommodate a broader set of substrates. Finally, the proposed mechanism does not bring the enzyme back to its starting state.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) The number of significant digits used in Table 1 and Figure 3 legend are not justified. The authors should use a maximum of 2 significant digits.

      Thank you for your kind suggestion. We have verified the relevant data and retained two significant digits.

      (2) The authors should use the words "mutant" or "mutation" only when discussing DNA, but when discussing protein, the words "variant" and "substitution" should be used instead as these are more appropriate.

      Thank you for your helpful suggestions. We have revised the relevant description in the manuscript as you suggested.

      (3) Lines 102-110 are a long, run-on sentence that should be split into shorter sentences. Similarly, lines 367-378 should be split into shorter sentences.

      Thank you for your suggestions. In the revised manuscript, the long sentences in lines 102-110 and 367-378 have been rewritten into shorter ones.

      (4) Lines 174-175: His, Tyr, Glu, and Trp are not positively charged residues and this wording should be changed.

      Thank you for your suggestions. We have revised the relevant description in the manuscript as you suggested.

      (5) Lines 423-426 require a reference.

      Thank you for your suggestion. We have provided the reference at the right position and revised the relevant description in the manuscript as you suggested.

      (6) Grammar/language:

      -line 90 - change "should emerge" to "likely emerged"

      -line 145 - delete "Finally"

      -line 264 - delete "their"

      -line 265 - delete "active sites"

      -line 265-266 - change to "To confirm this hypothesis, site-directed mutagenesis followed by enzyme activity assay was performed"

      -line 311 - change "residue in the catalytic cavity of GAGase III, which.." to "residue in its catalytic cavity, which..."

      -line 318 - change "affect" to "affected"

      -line 323 - change to "degrading activity of GAGase II remains to be determined outside of the His188 residue"

      -line 345 - delete "assays"

      -line 359 - change to "evidence"

      -line 397 - change "folds" to "3D fold"

      -line 420 - change to "share similar catalytic sites"

      -lines 411, 433 - change "conversed" to "conserved"

      -line 441 - change to "Mutational analysis showed that the His188.."

      -line 450 - delete "which"

      Thank you for your suggestions. Grammatical errors in the revised manuscript have been corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Major Concerns

      The electron density in your model clearly does not support the placement of a Mn ion. In the GAGase II structure, the placement of the Mn and the placement of waters around it still results in two density peaks of > 12 rmsd. The manuscript suggests that ICP-MS was done but the results of this are not shown anywhere. Please include your ICP-MS data. I see the structures have already been deposited, and if they have been deposited unchanged, please see if you can modify them to actually finish building the models. I don't find your data in Figure 2B particularly convincing that Mn is necessarily important for activity.

      Thank you for your kind comments. As we known, ICP-MS is a common method used for the detection of metal ions within proteins (doi: 10.1016/j.jbc.2023.103047; doi: 10.1074/jbc.RA119.011790), and thus we utilized it to determine the type of atoms within GAGases in this study. In the revised manuscript, the data of ICP-MS experiment has been presented in “Supplemental Table S1”, and the data clearly showed that the content of Mn<sup>2+</sup> rather than others in test sample is much higher than that in the negative control, suggesting the involvement of Mn<sup>2+</sup> in the protein. We agree that the addition of Mn<sup>2+</sup> does not show very strong promotion to the activity of GAGase II just like other tested metal ions, but the addition of EDTA significantly inhibited the enzyme activity (Figure 2), indicating that metal ion such as Mn<sup>2+</sup> is necessary for the function of GAGases. Regarding the role of metal ion, whether it participates in the catalytic reaction or only stabilize the structure of enzyme remains to be further explored in our further study.

      Minor Concerns

      (1) Please include CC1/2 in your Table 1.

      Thank you for your kind suggestions. CC1/2 parameters have been added in the revised manuscript (Table 1).

      (2) If possible please include SDS-PAGE gel images of your purified proteins. Particularly for the point mutations. Ideally, you would have done SEC on your mutants to show that the reduction in activity is not due to aggregation/misfolding, but at the very least I would to see that you have similar levels of purity.

      Thank you for your kind suggestions. As your suggestion, we have added SDS-PAGE gel images of purified GAGase II, GAGase III, GAGase VII, and their mutant enzymes to the supplementary data. As shown in Figure S5, site-directed mutagenesis did not affect the soluble expression levels of GAGase II, GAGase III or GAGase VII, indicating that the reduction in activity is not due to aggregation or misfolding. Due to the large number of variants, we used crude enzyme for the activity assay of substrate binding sites, while for some catalytic key residues, we purified the corresponding mutant enzymes and then verified their activities by HPLC.

      (3) When referring to your structural predictions, it is not appropriate to say that you used Robetta. Your reference is correct though - you should say that the structures were predicted using RoseTTAfold.

      Thank you for your helpful suggestions. We have revised the relevant description in the manuscript.

      (4) If possible expand on how the shorter/more open active site cavity would result in broader substrate specificity.

      Thank you for your kind comment. In the revised manuscript, figures (Supplemental Figure S2) with surface representations of the GAGase II and some representatively structurally similar GAGs/alginate lyases, with the dimensions of the cavity labeled, were added to the supplementary data. Considering the correlation between enzyme specificity and substrate binding sites, we speculated that a shorter substrate binding cavity might allow the enzyme to accommodate a wider variety of substrates, resulting in a smaller restriction of the catalytic cavity to substrate binding. However, unfortunately, we did not succeed in obtaining co-crystals of GAGases with any of the substrates. We will try to explain the mechanism of substrate selectivity in future studies by culturing and resolving crystals of its enzyme substrate complex or otherwise.

      (5) I would put less emphasis on His188 in GAGase III being a strong indicator that this protein represents an evolutionary intermediate between alginate lyases and GAGases.

      Thank you for your comment. The His<sup>188</sup> residue, which is unique compared to other GAGases, is essential for the alginate-degrading activity of GAGase III. Regarding why GAGases are thought to represent a possible evolutionary intermediate between alginate lyases and GAG lyases, phylogenetic analysis demonstrated that GAGases show considerable homology with some identified GAG lyases and alginate lyases (DOI: 10.1016/j.jbc.2024.107466). The similarity in primary structure between some GAG lyases, alginate lyases, and GAGases suggests structural similarities, which are further supported by this study. As structure determines function, structural similarity is often used as a key criterion when studying the evolution of proteins, the GAGase III, which shows significant GAGs and alginate-degrading activity, support for this speculation. Of course, in this study, our analysis of the evolutionary relationship between GAGases and identified GAG lyases and alginate lyases, based on structural comparison, is an attempt using existing methods. The conclusions we have drawn remain a hypothesis that still requires further evidence to support and validate.

    1. Reviewer #1 (Public review):

      Summary:

      The manuscript under review investigates the role of periosteal stem cells (P-SSC) in bone marrow regeneration using a whole bone subcutaneous transplantation model. While the model is somewhat artificial, the findings were interesting, suggesting the migration of periosteal stem cells into the bone marrow and their potential to become bone marrow stromal cells. This indicates a significant plasticity of P-SSC consistent with previous reports using fracture models (Cell Stem Cell 29:1547, Dev Cell 59:1192).

      Major comments from previous round of review:

      (1) The authors assert that the periosteal layer was completely removed in their model, which is crucial for their conclusions. To substantiate this claim, it is recommended that the authors provide evidence of the successful removal of the entire periosteal stem cell (P-SSC) population. A colony-forming assay, with and without periosteal removal, could serve as a suitable method to demonstrate this.

      (2) The observation that P-SSCs do not express Kitl or Cxcl12, while their bone marrow stromal cell (BM-MSC) derivatives do, is a key finding. To strengthen this conclusion, the authors are encouraged to repeat the experiment using Cxcl12 or Scf reporter alleles. Immunofluorescence staining that confirms the migration of periosteal cells and their transformation into Cxcl12- or Scf-reporter-positive cells would significantly enhance the paper's key conclusion.

      (3) On page 8, line 20, the authors' statement regarding the detection of Periostin+ cells outside the periosteum layer could be misinterpreted due to the use of the periostin antibody. Given that periostin is an extracellular matrix protein, the staining may not accurately represent Periostin-expressing cells but rather the presence of periostin in the extracellular matrix. The authors should revise this section for greater precision.

      Comments on revised version:

      My comments from the previous round of review have mostly been addressed.

    2. Reviewer #2 (Public review):

      Summary:

      The authors have established a femur graft model that allows the study of hematopoietic regeneration following transplantation. They have extensively characterized this model, demonstrating the loss of hematopoietic cells from the donor femur following transplantation, with recovery of hematopoiesis from recipient cells. They also show evidence that BM MSCs present in the graft following transplantation are graft-derived. They have utilized this model to show that following transplantation, periosteal cells respond by first expanding, then giving rise to more periosteal SSCs, then migrating into the marrow to give rise to BM MSCs.

      Strengths:

      These studies are notable in several ways: 1) establishment of a novel femur graft model for the study of hematopoiesis; 2) Use of lineage tracing and surgery models to demonstrate that periosteal cells can give rise to BM MSCs.

      Weaknesses:

      There are a few weaknesses. First, the authors do not definitively demonstrate the requirement of periosteal SSC movement into the BM cavity for hematopoietic recovery. Hematopoiesis recovers significantly before 5 months, even before significant P-SSC movement has been shown, and hematopoiesis recovers significantly even when periosteum has been stripped. Second, it is not clear how the periosteum is changing in the grafts. Which cells are expanding is unclear, and it is not clear if these cells have already adopted a more MSC-like phenotype prior to entering the marrow space. Indeed, given the presence of host-derived endothelial cells in the BM, these studies are reminiscent of prior studies from this group and others that re-endothelialization of the marrow may be much more important for determining hematopoietic regeneration, rather the P-SSC migration. Third, the studies exploring the preferential depletion of BM MSCs vs P-SSCs are difficult to interpret. The single metabolic stress condition chosen was not well-justified, and the use of purified cell populations to study response to stress ex vivo may have introduced artifacts into the system.

      Comments on the current version: The authors have addressed my concerns adequately

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript under review investigates the role of periosteal stem cells (P-SSC) in bone marrow regeneration using a whole-bone subcutaneous transplantation model. While the model is somewhat artificial, the findings were interesting, suggesting the migration of periosteal stem cells into the bone marrow and their potential to become bone marrow stromal cells. This indicates a significant plasticity of P-SSC consistent with previous reports using fracture models (Cell Stem Cell 29:1547, Dev Cell 59:1192).

      Major Concerns

      (1) The authors assert that the periosteal layer was completely removed in their model, which is crucial for their conclusions. To substantiate this claim, it is recommended that the authors provide evidence of the successful removal of the entire periosteal stem cell (P-SSC) population. A colony-forming assay, with and without periosteal removal, could serve as a suitable method to demonstrate this.

      We are grateful to the reviewer for this valuable suggestion. The objective of this experiment was to demonstrate that periosteal ablation impairs bone marrow regeneration, a finding that is supported by our results. We expect that ablation of the periosteum would be associated with only a partial decrease in CFU-F activity, given the presence of MSCs in the bone and in the endosteal region of the bone marrow. Therefore, CFU-F assays would be difficult to interpret in this setting. In view of the phenotype obtained, providing proof of concept of the importance of the periosteum, we do not believe that further experiments would strengthen the level of proof of this experiment.

      (2) The observation that P-SSCs do not express Kitl or Cxcl12, while their bone marrow stromal cell (BM-MSC) derivatives do, is a key finding. To strengthen this conclusion, the authors are encouraged to repeat the experiment using Cxcl12 or Scf reporter alleles. Immunofluorescence staining that confirms the migration of periosteal cells and their transformation into Cxcl12- or Scf-reporter-positive cells would significantly enhance the paper's key conclusion.

      Transplantation of periosteum isolated from Cxcl12 or Scf into WT bones is an excellent suggestion. Indeed, this experiment would confirm (1) the migration of periosteal SSC and (2) the expression of Cxcl12 and Scf by BM-MSCs derived from the periosteum .However, it should be noted that the current limitations in terms of available resources preclude the execution of these experiments. Moreover, the use of the PostnCre<sup>ER</sup>;Tmt mice represent the optimal approach for tracking and specifically isolating BM-MSCs derived from the periosteum. The expression of Cxcl12 and Scf by BM-MSCs derived from the periosteum has been demonstrated in 2 distinct experimental models (Figures 5 and 6).

      (3) On page 8, line 20, the authors' statement regarding the detection of Periostin+ cells outside the periosteum layer could be misinterpreted due to the use of the periostin antibody. Given that periostin is an extracellular matrix protein, the staining may not accurately represent Periostin-expressing cells but rather the presence of periostin in the extracellular matrix. The authors should revise this section for greater precision.

      We acknowledge and appreciate the reviewer's attention to detail. This is, in fact, an error. Nestin-GFP positive periosteal SSC are seen within the periosteum marked by an anti-periostin antibody labeling the extracellular matrix of the periosteum. The manuscript has been revised to address this inaccuracy on page 9, lines 8-9.

      Reviewer #2 (Public review):

      Summary:

      The authors have established a femur graft model that allows the study of hematopoietic regeneration following transplantation. They have extensively characterized this model, demonstrating the loss of hematopoietic cells from the donor femur following transplantation, with recovery of hematopoiesis from recipient cells. They also show evidence that BM MSCs present in the graft following transplantation are graft-derived. They have utilized this model to show that following transplantation, periosteal cells respond by first expanding, then giving rise to more periosteal SSCs, and then migrating into the marrow to give rise to BM MSCs.

      Strengths:

      These studies are notable in several ways:

      (1) Establishment of a novel femur graft model for the study of hematopoiesis;

      (2) Use of lineage tracing and surgery models to demonstrate that periosteal cells can give rise to BM MSCs.

      We thank the reviewer for noting the novelty of our manuscript.

      Weaknesses:

      There are a few weaknesses. First, the authors do not definitively demonstrate the requirement of periosteal SSC movement into the BM cavity for hematopoietic recovery. Hematopoiesis recovers significantly before 5 months, even before significant P-SSC movement has been shown, and hematopoiesis recovers significantly even when periosteum has been stripped.

      This is an important point. Notably, we can see expansion of P-SSCs by day 8 after femur transplantation and evidence of periosteum-derived SSCs in the bone marrow by day 15, before we can detect any significant hematopoietic recovery (see Figure 3A-C).

      Second, it is not clear how the periosteum is changing in the grafts. Which cells are expanding is unclear, and it is not clear if these cells have already adopted a more MSC-like phenotype prior to entering the marrow space.

      This is an interesting question. To examine early changes in gene expression in periosteal SSCs in grafted femurs, we performed additional RNA sequencing on host periosteal SSCs vs periosteal SSCs from grafted femurs at an earlier time point - at 3 days after femur transplantation and on host bone marrow MSCs (see new Supplementary Figure S5 A-C). At this time point the three cell populations are already distinct on the PCA plot (Figure S5A), and there is downregulation of some periosteal genes in the graft P-SSCs (Figure S5B). However, we do not yet see upregulation of Kitl or Cxcl12 or most other BM MSC genes in graft P-SSCs at this time point (Figure S5B). Furthermore, gene set enrichment analysis (GSEA) revealed upregulation of cell cycle, DNA replication and mismatch repair gene signatures, and downregulation of multiple gene signatures compared to host P-SSCs (Figure S5C). Therefore, we conclude that P-SSCs already adopt some gene expression changes early after femur transplantation, but have not yet fully differentiated into BM MSCs at this early time point. This experiment is now discussed on p.10 of the revised manuscript.

      Indeed, given the presence of host-derived endothelial cells in the BM, these studies are reminiscent of prior studies from this group and others that re-endothelialization of the marrow may be much more important for determining hematopoietic regeneration, rather than the P-SSC migration.

      Indeed, as previously shown by our group and others, we agree that endothelial regeneration and re-endothelialization may also play an important role in this bone marrow regeneration model. It is noteworthy that this model has the potential to serve as a valuable tool for analyzing the origin of BM endothelial cells during regeneration processes. To further illustrate the endothelial regeneration, additional images of bone sections from VE-cadherin-cre;TdTomato grafted femurs at 15 days, one month, and five months post-transplantation have been included in the new Figure S3. These images reveal extensive vascularization of the graft and proximity of UBC-GFP+ donor-derived vessels to VE-cadherin+ host-derived blood vessels in the bone marrow within one month (see Figure S2C). This observation is consistent with the timing of both BM MSC recovery and HSC recovery in the grafts, thereby suggesting the importance of endothelial recovery (see Fig. 1B). A new discussion of these findings has been included on page 6 of the revised manuscript and on page 16 in the discussion section.

      Third, the studies exploring the preferential depletion of BM MSCs vs P-SSCs are difficult to interpret. The single metabolic stress condition chosen was not well-justified, and the use of purified cell populations to study response to stress ex vivo may have introduced artifacts into the system.

      We chose to focus on hypoxia as the main condition in which to analyze the stress response of P-SSCs vs BM MSCs because we reasoned that due to the location of P-SSCs on the outside of the bone, these cells would be exposed to a higher oxygen tension than BM-MSCs, which are located within the bone marrow. Therefore, we wanted to determine whether this exposure to a different oxygen tension would be sufficient to explain the different properties of P-SSCs and BM MSCs. We modified the text on p.11 of the manuscript to explain the rationale for this experiment better.

      Reviewer #3 (Public review):

      Summary:

      Marchand, Akinnola, et al. describe the use of the novel model to study BM regeneration. Here, they harvest intact femurs and subcutaneously graft them into recipient mice. Similar to standard BM regeneration models, there is a rapid decrease in cellularity followed by a gradual recovery over 5 months within the grafts. At 5 months, these grafts have robust HSC activity, similar to HSCs isolated from the host femur. They find that periosteum skeletal stem cells (p-SSCs) are the primary source of BM-MSCs within the grafted femur and that these cells are more resistant to the acute stress of grafting the femur.

      Strengths:

      This is an interesting manuscript that describes a novel model to study BM regeneration. The model has tremendous promise.

      We thank the reviewer for highlighting the novelty and potential of our work.

      Weaknesses:

      The authors claim that grafting intact femurs subcutaneously is a model of BM regeneration and can be used as a replacement for gold standard BM regeneration assays such as sublethal chemo/irradiation. However, there isn't enough explanation as to how this model is equivalent or superior to the traditional models. For instance, the authors claim that this model allows for the study of "BM regeneration in vivo in response to acute injury using genetic tools." This can and has been done numerous times with established, physiologically relevant BM regeneration models. The onus is on the authors to discuss or perform the necessary experiments to justify the use of this model. For example, standard BM regeneration models involve systemic damage that is akin to therapies that require BM regeneration. How is studying the current model that provides only an acute injury more relevant and useful than other models? As it stands, it seems as if the authors could have done all the experiments demonstrating the importance of these p-SSCs in the traditional myelosuppressive BM regeneration models to be more physiologically relevant. Along these lines, the use of a standard BM regeneration model (e.g., sublethal chemo/irradiation) as a critical control is missing and should be included. Even if the control doesn't demonstrate that p-SSCs can contribute to the BM-MSC during regeneration, it will still be important because it could be the justification for using the described model to specifically study p-SSCs' regulation of BM regeneration.

      We appreciate the reviewer raising this important point. We never intended this femur transplantation model of bone marrow injury to replace more established models, such as chemotherapy or irradiation. In fact, we compared the effects of femur transplantation to localized bone irradiation on P-SSCs using our Periostin-Cre;Td-Tomato lineage tracing model. We found that irradiation does not induce the same migration of Tomato+ P-SSCs from the periosteum to the bone marrow cavity the way that femur transplantation, and cannot be used to demonstrate the plasticity of P-SSCs in the same way (see new Supplementary Figure S7D-E). Therefore, this appears to be a more severe form of bone marrow injury, and is not similar to other more established assays of bone marrow injury. We also added this discussion to the revised manuscript on p.14 and in the discussion section on p.17.

      The authors perform some analysis that suggests that grafting a whole femur mimics BM regeneration, but there are many experiments missing from the manuscript that will be necessary to support the use of this model. To demonstrate that this new model mimics current BM regeneration models, the authors need to perform a careful examination of the early kinetics of hematopoietic recovery post-transplant. Complete blood counts should be performed on the grafts, focusing on white blood cells (particularly neutrophils), red blood cells, platelets, all critical indicators of BM regeneration. This analysis should be done at early time points that include weekly analysis for a minimum of 28 days following the graft. Additionally, understanding how and when the vasculature recovers is critical. This is particularly important because it is well-established that if there is a delay in vascular recovery, there is a delay in hematopoietic recovery. As mentioned above, a standard BM regeneration model should be used as a control.

      We concur with the reviewer that hematopoietic recovery is a pivotal aspect of this model. We conducted a time-course analysis of bone marrow and HSC cellularity from day 0 to month 5 post-transplantation (Figure 1B). Furthermore, we evaluated the HSC capacities through bone marrow transplantation from grafted or host femurs (Figures 1D and 1E) and quantified the various hematopoietic cells in the graft after five months (Supplemental Figure 1). Furthermore, hematopoiesis occurring in the transplanted bone was comprehensively evaluated in another article, currently in revision and available in BioRxiv (Takeishi, S., Marchand, T., Koba, W. R., Borger, D. K., Xu, C., Guha, C., Bergman, A., Frenette, P. S., Gritsman, K., & Steidl, U. (2023). Haematopoietic stem cell numbers are not solely determined by niche availability. bioRxiv: the preprint server for biology, 2023.10.28.564559. https://doi.org/10.1101/2023.10.28.564559). We did not use another assay of bone marrow regeneration as a “control”, since we do not expect to see similar plasticity of periosteal SSCs in these models, such as with the localized irradiation model described in the new Figure S7D-E.

      We agree with the reviewer that endothelial recovery is also likely to be very important for hematopoietic recovery in this model, but this was not the focus of this manuscript. The process of endothelial recovery  is likely to be more complex than that of MSC recovery, as our findings indicate that the graft endothelium can arise from both the host and the graft femur (see Fig.2D). Consequently, further investigation into the mechanisms of endothelial recovery and its contribution to hematopoiesis in this experimental system will be an interesting focus of future work. We believe that this bone transplantation model represents a valuable tool for addressing questions regarding the origin and regeneration mechanisms of bone marrow endothelial cells.

      The contribution of donor and host cells to the BM regeneration of the graft is interesting. Particularly, the chimerism of the vasculature. One can assume that for the graft to undergo BM regeneration, there needs to be the delivery of nutrients into the graft via the vasculature. The chimerism of the vascular network suggests that host endothelial cells anastomose with the graft. Host mice should have their vascular system labeled with a dye such as dextran to determine if anastomosis has occurred. If not, the authors need to explain how this graft survives up to 5 months. If anastomosis does occur, then it is very surprising that the hematopoietic system of the graft is not a chimera because this would essentially be a parabiosis model. This needs to be explained.

      We have included additional images of bone sections from VE-cadherin-cre;tdTomato grafted femurs at 15 days, one month, and five months post transplantation in the new Figure S3. These images show extensive vascularization of the graft and proximity of UBC-GFP+ donor-derived vessels to VE-cadherin+ host-derived blood vessels in the bone marrow within one month, suggesting a potential anastomosis (Figure S2C). However, it is not surprising that hematopoiesis arises exclusively from the host, as we observed complete death of the hematopoietic cells and BM MSCs in the graft femur within the first 3 days of femur transplantation (see Figure S1A), and we do not see any significant hematopoietic recovery in the grafts until at least 2 months (see Fig.1B). Therefore, this is not similar to a parabiosis model, as confirmed by our chimerism studies shown in Figure 2D. In addition, these data are consistent with the results reported with the use of ossicles (doi:10.1038/nature09262; DOI 10.1016/j.cell.2007.08.025; doi:10.1038/nature07547).

      Most of the data presented for the resistance of p-SSCs to stress suggests DNA damage response. Do p-SSCs demonstrate a higher ability to resolve DNA damage? Do they accumulate less DNA damage? Staining for DNA damage foci or performing comet assays could be done to further define the mechanism of stress resistance properties of p-SSCs.

      This is an interesting question. In our RNA sequencing analysis of graft P-SSCs compared with host P-SSCs we did observe an upregulation of mismatch repair gene signatures by gene set enrichment analysis (GSEA) (new Figure S5C). Therefore, it is possible that P-SSCs do have an altered DNA damage response. However, we are unable to investigate this further at this time.

      Given the importance of BM-MSCs in hematopoiesis and that the majority of the emerging BM-MSCs appear to be derived from p-SSCs, the authors should perform experiments to determine if p-SSC-derived BM-MSCs are critical regulators of BM regeneration. For example, the authors could test this by crossing the Postn-creER mice with iDTR mice to ablate these cells and see if recovery is inhibited or delayed. This should be done with the described periosteum-wrapped femur graft model as well as a control BM regeneration model. Demonstrating that the deletion of these cells affects BM regeneration in both models would further justify the physiological relevance and utility of the femur graft model.

      We thank the reviewer for this excellent suggestion, and we agree that this is an important experiment. However, our attempts to ablate Postn+ cells using the iDTA system were limited by technical difficulties, which we are unable to address at this time.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 2C, the vascular network staining appears to be duplicated, suggesting a possible error in image capture. The authors should replace this image with a different field or an alternative picture to avoid confusion.

      We thank the reviewer for noting this accidental duplication due to an image stitching problem. Figure 2C was replaced by a different image from the same experiment.

      (2) For consistency and clarity, a scale bar should be included in Figure S3E to indicate that the magnification factors of the respective visual fields are identical.

      We thank the reviewer for highlighting this point. The magnification used has been added in the revised Figure.

      (3) In Figure S5B, the difference in normalized Opn mRNA expression relative to Gapdh between steady-state BM-MSCs and P-SSCs seems substantial, which contradicts the "ns" (not significant) label. The authors should verify the accuracy of this labeling.

      We agree with the reviewer that this difference in what is now Figure S6B looks substantial. However, we confirmed that this difference is not statistically significant, likely due to the high variability between replicates in Opn expression in the steady state BM MSCs.

      Reviewer #2 (Recommendations for the authors):

      In order to strengthen the argument that P-SSCs are necessary for hematopoietic recovery, the authors should consider providing the following data:

      (1) In the periosteal stripping experiments, the authors should show if periosteum-derived MSCs are present in the BM throughout the process of hematopoietic recovery (not just at the end of the experiment). If none are present at the end, that would mean that periosteum is not required for hematopoietic recovery, but would still suggest that it is required for optimal hematopoietic recovery. At early time points, it would also be very helpful to demonstrate the composition and amount of endothelium present in the marrow to determine if P-SSC migration and differentiation into MSCs depends on endothelial reconstitution.

      To further examine the vascularization of the transplanted femur at an earlier time point, we have added additional images of grafted femur from VE-cadherin-cre;tdTomato at 15 days and one month post transplantation in the new Figure S3A and S3B. These images already show extensive vascularization of the graft periosteum stained with an anti-periostin antibody. In addition, we observed anastomoses of host VE-cadherin;Tmt+ blood vessels with graft ubc-GFP+ blood vessels in the grafted periosteum within one month (Figure S3C).

      (2) Studies of the surgical periosteum grafts could benefit from histologic analysis of the BM and its MSC components at earlier time points following grafting since the data provided are only at 5 months. Such studies would allow a better appreciation of the relationship between P-SSC migration into the marrow and hematopoietic recovery.

      We have performed histologic analysis of grafted femurs at multiple early time points, which shows expansion of P-SSCs and their migration into the bone marrow cavity (Figure 3C).

      (3) Studies of stress responses preferably should be performed using intact bone and should characterize P-SSC and BM MSC apoptosis, cell cycle status, differentiation, etc, immediately following shifts to the stress conditions. These studies would be more compelling if performed using additional "stress" conditions likely to represent the graft environment.

      This is an interesting suggestion. However, these types of studies would not be possible in intact bones ex vivo, as P-SSCs are known to migrate out of the bone in culture.

    1. eLife Assessment

      This important work advances our understanding of how the SARS-CoV-2 Nsp16 protein is regulated by host E3 ligases to promote viral mRNA capping. Support for the overall claims in the revised manuscript is convincing . This work will be of interest to those working in host-viral interactions and the role of the ubiquitin-proteasome system in viral replication.

    2. Reviewer #1 (Public review):

      In this study, Tiang et al. explore the role of ubiquitination of non-structural protein 16 (nsp16) in the SARS-CoV-2 life cycle. nsp16, in conjunction with nsp10, performs the final step of viral mRNA capping through its 2'-O-methylase activity. This modification allows the virus to evade host immune responses and protects its mRNA from degradation. The authors demonstrate that nsp16 undergoes ubiquitination and subsequent degradation by the host E3 ubiquitin ligases UBR5 and MARCHF7 via the ubiquitin-proteasome system (UPS). Specifically, UBR5 and MARCHF7 mediate nsp16 degradation through K48- and K27-linked ubiquitination, respectively. Notably, degradation of nsp16 by either UBR5 or MARCHF7 operates independently, with both mechanisms effectively inhibiting SARS-CoV-2 replication in vitro and in vivo. Furthermore, UBR5 and MARCHF7 exhibit broad-spectrum antiviral activity by targeting nsp16 variants from various SARS-CoV-2 strains. This research advances our understanding of how nsp16 ubiquitination impacts viral replication and highlights potential targets for developing broadly effective antiviral therapies.

      Strengths:

      The proposed study is of significant interest to the virology community because it aims to elucidate the biological role of ubiquitination in coronavirus proteins and its impact on the viral life cycle. Understanding these mechanisms will address broadly applicable questions about coronavirus biology and enhance our overall knowledge of ubiquitination's diverse functions in cell biology. Employing in vivo studies is a strength.

      Weaknesses:

      Minor comments:<br /> Figure 5A- The authors should ensure that the figure is properly labeled to clearly distinguish between the IP (Immunoprecipitation) panel and the input panel.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript "SARS-CoV-2 nsp16 is regulated by host E3 ubiquitin ligases, UBR5 and MARCHF7" is an interesting work by Tian et al. describing the degradation/ stability of NSP16 of SARS CoV2 via K48 and K27-linked Ubiquitination and proteasomal degradation. The authors have demonstrated that UBR5 and MARCHF7, an E3 ubiquitin ligase bring about the ubiquitination of NSP16. The concept, and experimental approach to prove the hypothesis looks ok. The in vivo data looks ok with the controls. Overall, the manuscript is good.

      Strengths:

      The study identified important E3 ligases (MARCHF7 and UBR5) that can ubiquitinate NSP16, an important viral factor.

      Comments on revisions:

      I had gone through the revised form of the manuscript thoroughly. The authors have addressed all of my concerns. To me, the experimental approach looks convincing that the host E3 ubiquitin ligases (UBR5 and MARCHF7) ubiquitinate NSP16 and mark it for proteasomal degradation via K48- and K27- linkage. The authors have represented the final figure (Fig.8) in a convincing manner, opening a new window to explore the mechanism of capping the vRNA bu NSP16.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Major comments:

      (1) In Figure 1 the authors could reference and use NSP8 (PMID: 38275298) and Nucleocapsid (PMID: 37185839) in their experiments as positive controls.

      Thank you for your suggestion! In Figure 1A, during our screening of SARS-CoV-2 nsp proteins regulated by MG132, we confirmed that nsp8 can also be restored by MG132. This finding indicates that nsp8 is degraded via the proteasome pathway and can therefore serve as a positive control for the experiment. It has been reported that nsp8 undergoes degradation via the ubiquitin-proteasome pathway following its ubiquitination mediated by TRIM22. We have added the description at line 115 in the manuscript.

      (2) The data indicating that NSP16 is ubiquitinated come from overexpression systems, and it is possible that NSP16 ubiquitination only occurs in expression contexts, not during coronavirus infection. If NSP16 ubiquitination can't be measured in the context of infection, it is unclear how we can make any conclusions. The authors need to demonstrate the ubiquitination of NSP16 in the context of viral infection.

      We greatly appreciate the reviewer's suggestion and have incorporated the corresponding experimental results. As shown in Figure 5A, co-IP experiments using an endogenous nsp16 antibody were conducted following infection with the SARS-CoV-2 Wuhan strain. These experiments confirmed that the nsp16 protein encoded by the virus undergoes ubiquitination in infected cells. This finding highlights the ubiquitination of nsp16 within a biological context, thereby supporting our conclusions in expression contexts.

      (3) In Figure 4, adding controls will strengthen the authors' conclusion.

      a) Is it possible to observe ubiquitination of NSP16 by transfecting in NSP16-FLAG tagged, immunoprecipitate NSP16, run a western blot, and probe for endogenous ubiquitin?

      b) Can the authors please include an empty vector control as well as WT ubiquitin in these panels for comparison?

      c) In addition, why are the Ubiquitination patterns different in the IP panels of D and E vs B?? Without an empty vector control, it is challenging to conclude what the background is.

      Thank you for your valuable suggestions! We have made the following changes and additions in response to your comments:

      a) We have conducted the experiments as per the reviewer's suggestion. Figure 3B shows the result. Co-IP experiments were performed, and endogenous ubiquitination of nsp16 was observed using the endogenous ubiquitin antibody.

      b) We apologize for previously focusing solely on presenting multiple ubiquitin mutants on a single panel of nsp16 IP without considering the inclusion of an empty vector control and WT ubiquitin. The experiment has been redesigned and conducted, and the results are now presented in Figures 3E and 3F.

      c) The differences in the ubiquitination patterns observed between the IP panels in Figures 3E and 3F compared to 3C may be due to varying plasmids, differences in antibody and depth of exposure. To address this, we have standardized the plasmids in the figure and included an empty vector control as a negative control to clarify the background signal.

      (4) Overexpression of the ubiquitin mutants may have an indirect effect on protein homeostasis. The authors can also utilize linkage-specific antibodies in their studies to elucidate the ubiquitin linkage associated with NSP16 ubiquitination. K63-linkage Specific Polyubiquitin (D7A11) Rabbit mAb, 5621S, and K48-linkage Specific Polyubiquitin (D9D5) Rabbit mAb, 8081S from Cell Signaling Technologies?

      We greatly appreciate the reviewer's excellent suggestion! Using linkage-specific antibodies to elucidate the ubiquitin linkage associated with nsp16 ubiquitination would indeed provide more direct evidence. However, due to the long lead time for obtaining these antibodies, we plan to conduct further verification in future experiments.

      (5) The authors discussed the subcellular localization of overexpressed NSP16- showing the localization of NSP16 in the context of viral infection would strengthen the study. If this is challenging, can the authors express NSP16 along with the co-factor NSP10 and examine its subcellular localization?

      Thank you for your suggestion! During viral infection, we observed the ubiquitination of the nsp16 protein through co-IP experiments, indicating that the presence of nsp10 does not influence the regulation of nsp16 ubiquitination by MARCHF7 or UBR5 (Figure 5A). Therefore, we believe that investigating the co-localization of nsp10 and nsp16 would not provide additional value to our results. Additionally, through a literature review, we found studies that have already examined the localization of nsp10 and nsp16 following viral infection. These studies revealed that nsp10 was located in the cytoplasm, while nsp16 can be detected in both the nucleus and cytoplasm (PMID: 33080218; PMID: 34452352). This observation is consistent with the localization of nsp16 that we observed in our overexpression experiments.

      (6) a) In Figure 3A, the authors should note that the interaction of NPS16 appears weak with UBR5. The authors should confirm that the interaction of NSP16 and the E3 ligases is relevant in the context of viral infection.

      b) In Figure 3B, the scale bars should be labeled in at least one panel, as well as in the legend.

      c) The authors discussed nuclear localization of MARCHF7, UBR5, and NSP16, therefore a control with a nuclear stain should be included in this figure to enhance the study.

      d) Some panels look overexposed while others are blurry which decreases the robustness of the interaction as the authors stated in line 191. To strengthen the results of Figure 3, consider GST purification and in vitro, cell-free binding assays to confirm a direct interaction between nsp16 and the E3 ligases

      Thank you for the reviewer’s thoughtful suggestions! We have made the following changes and adjustments based on your recommendations:

      a) On the interaction between nsp16 and UBR5:

      The interaction between nsp16 and UBR5 appears to be weak, possibly due to the large size of the UBR5 protein (300 kDa). As a result, there are challenges in presenting the experimental results, including difficulties in both expression and protein level detection. To further confirm the relevance of the interaction between nsp16 and the E3 ligases in the context of viral infection, we have performed experiments, and the results are presented in Figure 5A.

      b) On scale bars:

      The issue regarding the scale bars in Figure 4 has been addressed, and we have now included them in the figure legend for clarity (Line 885).

      c) On nuclear localization control:

      For the localization of MARCHF7, UBR5, and nsp16 in Figure 4C, given that both MARCHF7 and UBR5 are tagged with CFP, DAPI staining would result in spectral overlap. However, we conducted co-localization experiments for MARCHF7 or UBR5 with nsp16 in Figure 4—figure supplements 1E and 1F, where DAPI staining was included to illustrate the localization of these three proteins. Our experiments showed that while these proteins are present in both the nucleus and cytoplasm, they are predominantly localized in the cytoplasm.

      d) On validation of direct interaction:

      We attempted GST purification and in vitro cell-free binding assays to verify the direct interaction between nsp16 and the E3 ligases. However, UBR5 and MARCHF7 are both large proteins, with UBR5 being particularly large, which significantly increased the difficulty of purification. Additionally, we faced challenges in purifying nsp16, as the purified nsp16 protein tended to aggregate. We will continue to optimize purification techniques and conditions in future experiments.

      We appreciate your valuable comments, which have greatly contributed to improving our experiments and conclusions.

      .

      (7) To confirm the knockdown of the E3 ligases by siRNA, the authors should use western blotting to show the presence/absence/decrease of the protein levels in addition to mRNA levels by RT-PCR. The authors have the lysates, and they have shown that the antibodies for MARCHF7 and UBR5 work therefore including this throughout the manuscript to help substantiate the authors' conclusions.

      Thank you for the reviewer’s valuable suggestion! We have validated the knockdown efficiency at the protein level for the experiments involving siRNA knockdown. Corresponding Western blot images are now included in the relevant experiments to substantiate our conclusions, in addition to the RT-PCR data, including Figures 2, 4 and 5.

      (8) In the overexpression studies of the E3 ligases with viral infection in Figure 5, the authors should include the catalytic mutants for the E3 ligases with the nsp16 gradient experiment. This would strengthen the conclusion of the studies.

      Thank you for the reviewer’s suggestion! We have conducted the relevant experiments based on your recommendation, and the corresponding data are presented in the Figure 6—figure supplements 2A-H. These results strengthen the conclusions of our study.

      (9) Figure 5: For C and F, for a better comparison of the efficacy against the 2 strains, the authors should use the same scale. This could benefit from a kinetics experiment.

      Thank you for the reviewer’s suggestion! We have made revisions in Figures 5E and 5H in responses to your recommendation.

      (10) Is there a synergistic effect of double E3 knockdown on viral replication?

      Thank you for the reviewer’s question! In Figures 5—figure supplement 1A-B, we conducted experiments by individually and simultaneously knocking down MARCHF7 or UBR5, followed by infection with viral SARS-CoV-2 transmissible virus-like particles. The results revealed that simultaneous knockdown further enhances viral replication, demonstrating a synergistic effect.

      (11) In lines 98-100 the authors state "This dual targeting by MARCHF7 and UBR5 impairs the 2'-O-MTase activity of nsp16, blocking the conversion of cap-0 to cap-1 at the 5 'end of viral RNA, ultimately exhibiting potent antiviral activity against SARS-CoV-2". The authors did not examine the 2'-O-MTase activity of nsp16. The authors should rephrase this or provide the data if this experiment was done.

      Thank you for the reviewer’s valuable suggestion! Based on your comment, we have revised the ambiguous wording located in lines 100-104.

      (12) In the discussion, the authors reported that elucidating a specific lysine residue (s) that is ubiquitinated was challenging and stated that they generated multiple mutants including truncated mutants, and wrote "data not shown". The authors need to include this data as supplementary.

      Thank you for the reviewer’s suggestion! Based on your comment, we have included the data regarding the specific lysine residue(s) that is ubiquitinated, along with the truncated mutants, as supplementary data (Appendix-figure S2).

      (13) In Figure 7, the authors showed a copy number of SARS CoV-2 E in lung tissue. The authors should show viral titers using either the plaque assay or the TCID50 assay.

      Thank you for the reviewer’s suggestion! Based on your comment, we measured the TCID50 of the virus in the lung tissue homogenates, and the results are presented in Figure 7D.

      Minor comments:

      (1) Line 76: while many E3 ubiquitin ligases directly recognize and bind to their target substrates, cullin-RING ligases directly bind an adaptor, which binds a substrate receptor and/or the substrate directly, while the RING-box protein binds a different surface of the cullin and is also not directly interacting with substrate.

      Thank you for the reviewer’s valuable suggestion! Based on your comment, we have revised the ambiguous wording in line 76.

      (2) Line 161: having introduced the suggestion that NSP16 is ubiquitinated by these ligases, consider moving Figure 4 to the Figure 3 spot.

      Based on your comment, we have rearranged the order of the figures and moved Figure 4 to the Figure 3 spot.

      (3) Figure 2: Can the authors please do +/- MG132 for each siRNA? It is possible that the lanes where we don't see NSP16 were because there was no NSP16 expressed, OR it was degraded, MG132 would confirm one or the other.

      Thank you for the reviewer’s suggestion! Based on your comment, we have redesigned the experiment and included the MG132 treatment for each siRNA. The results are presented in Figure 2A.

      (4) Line 165: The authors write "As confirmed by MS, both Myc-tagged MARCHF7 and endogenous UBR5 interact with nsp16, as seen in the Co-IP experiment" should be the reverse, MS suggests NSP16-E3 interaction, the co-ip confirms this.

      Based on your comment, we have revised the wording in line 183 to ensure accuracy. MS suggests the interaction between nsp16 and the E3 ligases, while the Co-IP experiment confirms this interaction.

      (5) Line 178: the cited paper doesn't clearly show NSP16 nuclear localization, nor do the authors of said paper claim that they found it there. It is cytoplasmic. Additionally, said paper used overexpression, and it is unclear if NSP16 is nuclear in the context of viral infection.

      Thank you for the reviewer’s suggestion! The referenced paper states, "As can be seen in the Supplementary Fig. S2, the viral proteins are either cytoplasmic (NSP2, NSP3C, NSP4, NSP8, Spike, M, N, ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, ORF9b, and ORF10) or both nuclear and cytoplasmic (NSP1, NSP3N, NSP5, NSP6, NSP7, NSP9, NSP10, NSP12, NSP13, NSP14, NSP15, NSP16, E, and ORF9a)," indicating that nsp16 is localized in both the nucleus and cytoplasm. Upon reviewing the literature, we found that the paper (PMID: 33080218) reports the distribution of nsp16 protein following viral infection. The results indicate that nsp16 is present in both the nucleus and cytoplasm, although the authors of the referenced paper claim that ns16 was located in the nucleus.

      (6) Line 197: in addition to the 7 lysine residues, ubiquitin can also form linear N-terminal linkages.

      Thank you for the reviewer’s suggestion! Linear N-terminal ubiquitination, with its distinct linkage and substrate recognition mechanism, is typically mediated by a complex consisting of the E3 ubiquitin ligases HOIL-1 and HOIP, and differs from classical ubiquitination. Therefore, this type of ubiquitin chain was not investigated in our experiments.

      (7) Line 202: Authors state "Interestingly, all single-lysine Ub mutants promoted nsp16 ubiquitylation to varying degrees, indicating a complex polyubiquitin chain structure on nsp16 potentially regulated by multiple E3 ligases". However, not all the mutants. K33 isn't supported by the blot.

      Thank you for pointing that out! Indeed, we made an error in our description. The K33 mutant did not promote nsp16 ubiquitylation, and we have corrected this in the manuscript accordingly in line 173.

      (8) Line 204: consider including "E2-E3 ligase pairs" for RING ligases the E2 determines the linkage type see: Cell Research (2016) 26:423-440.

      Thank you for your suggestion! We have included the term "E2-E3 ligase pairs" in the article in line 176.

      (9) Line 235: The authors used the real virus, the inclusion of the BLS2 virus here is extraneous, it doesn't add anything. The authors can consider removing it.

      Thank you for your suggestion! In our experiments, we performed simultaneous knockdown of two E3 ligases, so we believe this data is relevant and should not be removed.

      (10) Line 238: Authors state: "led to a significant increase in SARS-CoV-2 levels compared to the control group". What is meant by "levels?"

      Thank you for your careful reading. We have updated "levels" to "replication" as suggested to clarify the meaning in line 237.

      (11) Line 245: increased titers. This could be improved for specificity by saying, 1-log increase for example.

      Thank you for the reviewer's valuable suggestions. We have made the necessary changes and specified "increased titers" as a "1-log increase" in lines 249 and 261.

      (12) Line 249: in Figure 5H again, the authors are showing relative mRNA levels. Ideally should show protein levels by western blot.

      Thank you for the reviewer's suggestion! We have performed protein-level detection of the knockdown efficiency for the samples, and the bands have been placed in the corresponding positions in Figure 5I.

      (13) Line 259: "strongly linked to their ability to modulate..." This appears to be an overextension of the data. The data show nsp16 levels can compensate for E3 overexpression, but not that the E3 ligases are modulating this activity. We can infer this from previous experiments. Perhaps increasing the NSP12 levels would also have the same effect as they don't show that this is specific to NSP16. What about a catalytically dead E3?

      Thank you for the reviewer's thoughtful suggestion. We have revised the wording accordingly and designed the viral-related experiments with E3 enzyme activity mutants in Figure 6 supplement 2.

      (14) Figure 6: In panel H the MW for UBR5 is incorrect, should be around 300kDa.

      Thank you for the reviewer's detailed suggestions. We have made the necessary revisions in Figure 6H.

      (15) Line 267: "suggesting a more conserved sequence". What are the authors referring to? More conserved than what? This section would benefit from a discussion of which residues are mutated. Are they potential Ub sites, which could point to differential degradation by the E3s as due to more ubiquitination? Or rather to more efficient interaction with the E3? Is this conserved in related CoVs: original SARS and MERS, for instance?

      Thank you for the reviewer’s detailed suggestions. In this context, by “conservation,” we refer to the relative conservation of nsp16 proteins across different subtypes of the Omicron variant. We found that most of the mutation sites contained only 1 to 2 mutations. Additionally, we have constructed and validated multiple-mutant nsp16 proteins, which are still degraded by MARCHF7 or UBR5. Given the ongoing prevalence of the Omicron variant, we aim to explore the broad-spectrum degradation and antiviral effects of these two E3 ligases. While it would be ideal if these experiments could aid in identifying the ubiquitination sites, we have not yet identified any mutant forms that escape degradation. We also compared the nsp16 proteins of several other coronaviruses (such as human coronaviruses 229E, HKU1, MERS-CoV, NL63, OC43, and SARS-CoV-1), and found that these viruses' nsp16 proteins are not highly conserved. As a result, we have not further investigated whether MARCHF7 or UBR5 regulate the nsp16 proteins of these viruses.

      (16) Line 347: 2C of what virus?

      Thank you for the reviewer’s careful reading. We have made the necessary additions to address this point in line 357.

      (17) Line 890: "Scale bars, 25 mm". Should it be 25nm?

      Thank you for your feedback! I realized there was an error in the unit labeling, and I have corrected the relevant sections in line 904. I appreciate your careful reading.

      Reviewer #2 (Recommendations for the authors):

      (1) In Figure 6, the authors found that increasing amounts of nsp16 restored the replication of SARS-CoV-2 in the presence of MARCHF7 or UBR5. The authors better discuss the possibility that nsp16 may stimulate viral replication regardless of these E3 ligases, or provide evidence to further clarify this.

      Thank you for your thoughtful suggestion! Given the strong functionality of nsp16 itself, your consideration is very comprehensive. In Figure 6—figure supplement 2A–H, we conducted transfection experiments with E3 activity-deficient proteins and reintroduced nsp16. The results showed that, in the absence of active MARCHF7 or UBR5 antiviral function, overexpression of nsp16 did not promote viral replication, although the RNA levels of the M protein slightly increased. Therefore, in our experiments, excess nsp16 did not significantly stimulate viral replication.

      (2) In Figure 7, the in vivo data supports the function of both E3 ligases to reduce viral infectivity. Is it possible that tail vein injection of naked plasmid DNA may stimulate the innate immune system, e.g., induce IFN as a DNA vaccine, which may contribute to the inhibitory effect? The authors are suggested to discuss or address it.

      Upon reviewing the relevant literature, we found that the hydrodynamic gene delivery (HGD) method using naked DNA is both highly efficient and associated with a low risk of triggering immune responses or oncogenesis. Studies have shown that HGD only weakly activates host immunity (reference: 37111597), which is less of a concern compared to other gene delivery methods. Although some studies have reported strong immune responses following the injection of naked DNA (e.g., Otc cDNA) in human trials, it is noteworthy that no such responses were observed in 17 other participants. This suggests that the immune reactions observed in some cases may be due to individual variability or limitations in animal models, which may not fully translate to human trials.

      Based on these findings, we believe that the antiviral effects observed in our study are primarily attributable to the intrinsic properties and functions of the E3 ligases.  Furthermore, it has been reported that mice and non-human primates exhibit significantly greater resistance to innate immune activation compared to humans. This highlights the challenges in translating these findings into effective antiviral therapeutics and underscores the need for further research in this area. We have incorporated the requested discussion into the manuscript in lines 393-410.

      (3) The authors shall include some of the key data in supplementary figures in the main text, such as the study on UBR5 and MARCHF7 mediate broad-spectrum degradation of nsp16 variants and SARS-CoV-2 infection decreases UBR5 and MARCHF7 expression, which make it easier for readers to follow.

      Thank you for your valuable suggestion regarding the organization of our manuscript. In response to your feedback, we have moved the study on nsp16 variants to the Figure 6—figure supplement 3. Additionally, the data showing changes in UBR5 and MARCHF7 levels following viral infection have been added as supplementary data in Figure 6—figure supplement 4.

      (4) The diagrammatic sketches in Figures 1E, S1A and B, 7A, and 8 had low resolutions. Please change them to higher resolutions. Moreover, please state the licensing rights of these diagrammatic sketches.

      Thank you for your detailed review! In response to your comment, we have improved the resolution of Figures 1E, S1A and B, 7A, and 8. Additionally, we have specified the drawing tools and source websites in the figure legends (lines 794, 813, 999, and 1013). And we have obtained the necessary licenses for each diagram.

      Figure 1E: Created in BioRender. Li, Z. (2025) https://BioRender.com/h43f612

      Figure S1B: Created in BioRender. Li, Z. (2025) https://BioRender.com/b98t559

      Figure 7A: Created in BioRender. Li, Z. (2025) https://BioRender.com/e76g512

      Figure 8: Created in BioRender. Li, Z. (2025) https://BioRender.com/o84p897

      (5) The authors suggested that both UBR5 and MARCHF7 had a function in triggering the degradation of NSP16, however, the expression of UBR5 but not MARCHF7 was shown to be associated with the severity of clinical symptoms. Further, why did the host evolve 2 kinds of E3 ligases to adjust only 1 viral target? Please discuss them.

      Thank you for your insightful comments. We acknowledge that the limited number of patients with varying degrees of illness in our study could potentially mask some of the observed phenomena. Additionally, individual variability may also play a significant role, which highlights the challenges in translating findings from animal models to human trials.

      Regarding the presence of two E3 ligases targeting the same substrate, we view this as part of an evolutionary arms race between the host and the virus. Viruses evolve mechanisms to counteract the host’s antiviral responses, while the host, in turn, develops multiple pathways and strategies to combat viral infection. This dynamic may explain why multiple E3 ligases regulate the levels of the same factor, reflecting the host’s complex and redundant antiviral defense mechanisms. We have incorporated the requested discussion into the manuscript in lines 359-362.

      (6) Please standardize the symbol size of the bar charts in the same figure, just like in Figures 1D and 5.

      Thank you for your constructive suggestion. We have standardized the symbol sizes of the bar charts in the figure as per your recommendation, ensuring consistency across all panels.

      (7) The use of English could be improved.

      Thank you for your feedback regarding the language. We have carefully reviewed the manuscript and made revisions to improve the clarity and fluency of the English.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) In Figure 1: The expression level of NSP6, 10, 11, and 12 is weak. Include a higher exposure blot (right next to these blots marking as higher exposure) to show the expression of these plasmids. Here, the NSP12 plasmid has no expression, so it is difficult to conclude the effect of MG132 from this blot. It will be appropriate to show the molecular weight of each gene fragment since some of the plasmids have multiple bands. Verify the densitometric analysis, the NSP4 (+/- MG132) blot, and the densitometric analysis do not correlate. Figure 1B: It is recommended to include appropriate control (media only) for NH4Cl. The DMSO control serves well for the drugs, not for Ammonium Chloride. In Figure 1C, how did the authors arrive at the 15-hour time point? The correlation does not appear as the authors claim. Where is the 15-hour sampling time point for MG132 or CHX chase? The experimental approach to screen the E2/E3 Ub ligase is appreciated.

      Thank you for your valuable feedback! Regarding your questions, we have made the following revisions:

      On the expression of nsp6, nsp10, nsp11, and nsp12 in Figure 1:

      We have replaced the blots for nsp10, nsp11, and nsp12 with higher exposure blots. However, due to the strong expression of NSP14, we were unable to generate a higher exposure blot for nsp6. Based on the current exposure, it is clear that nsp6 is not regulated by the proteasome. Additionally, in the high-exposure blot for nsp12, we were able to observe its expression and found that this protein is weakly regulated by MG132. Following your suggestion, we have labeled the molecular weights of the proteins in the figure.

      On the densitometric analysis of nsp4 protein:

      We recalculated the densitometric analysis for nsp4 and found no issues. Although the band intensities do not show large changes, the relative fold changes appear more pronounced because we normalized the data using GAPDH as an internal control. We have added detailed description in the figure legend.

      On the NH4Cl control:

      In this experiment, ammonium chloride was dissolved in DMSO. We reviewed the solubility data and found that ammonium chloride has a solubility of 50 mg/ml in DMSO, which is sufficient to reach the concentrations used in our experiment. While the solubility is higher in water, we believe that DMSO is an appropriate solvent for this compound in our context.

      On the 15-hour time point in Figure 1C:

      Regarding the 15-hour time point mentioned in Figure 1C, we did not collect samples at that time. We performed semi-quantitative analysis of protein levels at different time points using ImageJ and estimated the half-life time point based on the half-life calculation formula. Thank you for your suggestion; we will clarify this in the figure legend.

      Once again, thank you for your thoughtful review and constructive suggestions. We have made the necessary revisions and improvements to the figures based on your feedback.

      (2) In Figure 2: I do not find a reason to include DMSO control in the siRNAs for E2/E3 Ub. Please justify why it is necessary. It is requested to include WB for the siRNA-treated samples. It is strongly recommended to show the WB data for siRNA-treated samples because you are showing siRNA treatment of MARCHF7 in shUBR5 cells and vice versa. However, if antibodies for corresponding targets are not available, qPCR can be shown in graphical representation in supplementary data indicating the siRNA target region and qPCR target. Show a graphical representation of domains/ deleted regions of MARCHF7 and UBR5.

      Thank you for your valuable feedback! We have addressed your concerns as follows:

      On the inclusion of the DMSO control group:

      The DMSO group was initially included as a control for the MG132-treated group. By comparing with the MG132 group, we aimed to observe whether nsp16 levels were restored by MG132 treatment. Additionally, in siRNA knockdown experiments, the DMSO group was included to compare nsp16 protein levels after knockdown with those in the NC group, as well as to assess differences in nsp16 restoration between MG132 treatment and factor knockdown. However, we acknowledge some issues in the control design. To address this, we have redesigned and conducted the experiments with improved controls (Figure 2A).

      On validating knockdown efficiency:

      We have included Western blot data for UBR5 and MARCHF7 knockdown efficiencies. For other factors where specific antibodies were unavailable, we followed your suggestion and provided graphical representations in the Appendix-figure S1, illustrating the siRNA target regions and qPCR target sites to confirm knockdown specificity and efficiency.

      (3) In Figure 4 A: Write details on how this IP was done. What was the transfection time of this plasmid? Is the transfection time different from that of NSP16 in Figure 1A which shows a significant degradation of NSP16? Please discuss this in detail. It is recommended that this IP be done in +/- MG132. Since you have used siRNA and performed an IP, It is recommended to repeat the IP (with +/- MG132) using the MARCHF7 and UBR5 plasmids

      Thank you for your detailed review and suggestions! We have addressed your concerns as follows:

      On the specific protocol for the co-IP in Figure 3A:

      The detailed protocol for the immunoprecipitation (IP) experiment is as follows: on day 1, cells were plated, and on day 2, we co-transfected nsp16 and Ub expression plasmids. After 32 hours of transfection, we treated the cells with MG132 for 16 hours, then harvested the cells for IP. We included MG132 treatment in all ubiquitination IP experiments because, without MG132, nsp16 would be degraded, preventing us from observing changes in ubiquitination levels. We apologize for not clearly labeling this in the figure, and we have made the necessary modifications.

      On the use of MG132 and NSP16 degradation:

      Following your suggestion, we have clarified the use of MG132 in the IP experiments, which differs from the degradation of nsp16 shown in Figure 1A. In Figure 1A, we show the degradation of nsp16 in the absence of MG132 treatment.

      On the overexpression of UBR5 and MARCHF7:

      The effect of overexpressing UBR5 or MARCHF7 on ubiquitination has been validated in Figure 4 supplement 2. In these experiments, we explored the effect of UBR5 activity domain inactivation on nsp16 ubiquitination, as well as the effect of MARCHF7 truncation on nsp16 ubiquitination modification. In these experiments, overexpression of the wild-type E3 ligases was also included, and the results yielded the same conclusions as those from the E3 knockdown experiments, thereby validating the robustness of our findings.

      (4) In Figure 4C: Appropriate controls are missing. The authors claim NSP16 is ubiquitinated and degraded by UBR5 and MARCHF7 via K27 and K48 chains. There is no NSP16 Only control. We cannot compare the NSP16 without an NSP16 transfection. I will suggest the authors repeat these individual controls in both the presence and absence of MG132.

      Thank you for your careful review and valuable suggestion! In response to your comment, we have redesigned the experiment and added a control group without nsp16 transfection. We have repeated the validation in the presence of MG132. Without MG132 treatment, nsp16 is degraded, leading to very low protein levels, making it difficult to observe the phenomenon. We have updated the figure accordingly and made the necessary adjustments based on your suggestion (Figure 3E-F).

      (5) In my opinion, the Figure 8 needs modification. It is requested to show the levels of strand-specific viral mRNA under UBR5 and MARCHF7 knock-down in +/- of MG312. This figure should also be supported by WB indicating the level of NSP16 (capping activity) and any of the viral proteins. This may validate that if the capping activity is lost, viral translation is affected and hence there is a reduction in virus titre. Alternatively, the figure can be modified by putting a sub-heading box over 7mGppA-RNA section and marking it as a future direction/ hypothesis.

      Thank you for your thorough and thoughtful review! Regarding the modification of Figure 8, we completely agree with your suggestion. Currently, examining the impact of viral RNA cap modification is technically challenging for us. Therefore, we have followed your advice and marked the investigation of how nsp16 degradation affects viral RNA cap structures as a future direction/hypothesis in the schematic of Figure 8. This revision helps provide direction for future experiments and enhances the clarity of the figure. Thank you for your thoughtful consideration and valuable suggestion!

      Minor points:

      (1) Figure 2A: Align NSP16 Blot to actin.

      Thank you for your constructive feedback! We have redesigned the experiment and included an MG132 treatment group in Figure 2A. Consequently, the figure has been revised comprehensively, and the nsp16 blot has been aligned with tubulin.

      (2) Figure 2C: It is recommended to properly align the lanes where the pLKO and shRNA labelling are overlapping.

      Thank you for your thoughtful suggestion! We have revised Figure 2C based on your recommendation to ensure that the pLKO and shRNA labeling no longer overlap. We sincerely apologize for any confusion this may have caused and appreciate your understanding and support.

      (3) Just a curious question, what happens if we silence both UBR5 and MARCHF7 and check for virus titre? This is an additional work, but if the authors do not agree, it is ok.

      Thank you for your valuable suggestion! Regarding your question about silencing both UBR5 and MARCHF7, we indeed attempted to generate knockout cell lines, but unfortunately, we were not successful at this stage. We plan to explore alternative methods to establish stable knockout cell lines in our future experiments. Meanwhile, as shown in Figure 5 supplement 1, we have performed experiments where both UBR5 and MARCHF7 were knocked down simultaneously, followed by infection with virus-like particles. The results indicate that dual knockdown further enhances viral replication. These findings may partially address your question. Thank you again for your insightful suggestion!

    1. eLife Assessment

      This study provides a valuable contribution to our understanding of causal inference in visual perception. The evidence provided through multiple well-designed psychophysical experiments is convincing. The current study targets very specific visual features of launch events, future work will be able to build on this to study the implementation of causal inference in general.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated causal inference in the visual domain through a set of carefully designed experiments, and sound statistical analysis. They suggest the early visual system has a crucial contribution to computations supporting causal inference.

      Strengths:

      (1) I believe the authors target an important problem (causal inference) with carefully chosen tools and methods. Their analysis rightly implies the specialization of visual routines for causal inference and the crucial contribution of early visual systems to perform this computation. I believe this is a novel contribution and their data and analysis are in the right direction.<br /> (2) Authors sufficiently discuss the alternative perspective to causal inference.<br /> (3) The authors also expand the discussions beyond pure psychophysics and also include neural aspects.

      Weaknesses:

      I would not call them weaknesses, perhaps a different perspective:

      (1) Authors arguing pro a mere bottom-up contribution of early sensory areas for causal inference. Certainly, as the authors suggested, early sensory areas have a crucial contribution, and the authors expand it to other possibilities in their discussion (but more for more complex scenario). It would say, even in simple cases, we can still consider the effect of top down processes. This particularly makes sense in light of recent studies. These studies progressively suggest perception as an active process that also weighs in strongly, the top-down cognitive contributions. For instance, the most simple cases of perception have been conceptualized along this line (Martin, Solms, and Sterzer 2021) and even some visual illusions (Safavi and Dayan 2022), and other extensions (Kay et al. 2023). Thus, I believe it would be helpful to extend the discussion on the top-down and cognitive contributions of causal inference (of course that can also be hinted at, based on recent developments). Even adaptation, which is central in this study, can be influenced by top-down factors (Keller et al. 2017).

      Lastly, I hope the authors find this review helpful. I generally want to try to end all of my reviews with areas of the paper I liked because I think this should be part of the feedback. Certainly, there were many in this manuscript as well (clever questions, experimental design and statistical analysis) that I had to highlight further. I congratulate the authors again on their manuscript and hope they will find it helpful.

      Bibliography

      Aller, Mate, and Uta Noppeney. 2018. "To Integrate or Not to Integrate: Temporal Dynamics of Bayesian Causal Inference." Biorxiv, December, 504118. .

      Cao, Yinan, Christopher Summerfield, Hame Park, Bruno Lucio Giordano, and Christoph Kayser. 2019. "Causal Inference in the Multisensory Brain." Neuron 102 (5): 1076-87.e8. .

      Coen, Philip, Timothy P. H. Sit, Miles J. Wells, Matteo Carandini, and Kenneth D. Harris. 2021. "The Role of Frontal Cortex in Multisensory Decisions." Biorxiv, April. Cold Spring Harbor Laboratory, 2021.04.26.441250. .

      Kay, Kendrick, Kathryn Bonnen, Rachel N. Denison, Mike J. Arcaro, and David L. Barack. 2023. "Tasks and Their Role in Visual Neuroscience." Neuron 111 (11). Elsevier: 1697-1713. .

      Keller, Andreas J, Rachael Houlton, Björn M Kampa, Nicholas A Lesica, Thomas D Mrsic-Flogel, Georg B Keller, and Fritjof Helmchen. 2017. "Stimulus Relevance Modulates Contrast Adaptation in Visual Cortex." Elife 6. eLife Sciences Publications, Ltd: e21589.

      Kording, K. P., U. Beierholm, W. J. Ma, S. Quartz, J. B. Tenenbaum, and L. Shams. 2007. "Causal Inference in Multisensory Perception." PloS One 2: e943. .

      Martin, Joshua M., Mark Solms, and Philipp Sterzer. 2021. "Useful Misrepresentation: Perception as Embodied Proactive Inference." Trends Neurosci. 44 (8): 619-28. .

      Safavi, Shervin, and Peter Dayan. 2022. "Multistability, Perceptual Value, and Internal Foraging." Neuron, August. .

      Shams, L. 2012. "Early Integration and Bayesian Causal Inference in Multisensory Perception." In The Neural Bases of Multisensory Processes, edited by M. M. Murray and M. T. Wallace. Frontiers in Neuroscience. Boca Raton (FL).

      Shams, Ladan, and Ulrik Beierholm. 2022. "Bayesian Causal Inference: A Unifying Neuroscience Theory." Neuroscience & Biobehavioral Reviews 137 (June): 104619.

    3. Reviewer #2 (Public review):

      This paper seeks to determine whether the human visual system's sensitivity to causal interactions is tuned to specific parameters of a causal launching event, using visual adaptation methods. The three parameters the author investigates in this paper are the direction of motion in the event, the speed of the objects in the event, and surface features or identity of the objects in the event (in particular, having two objects of different color).

      The key method, visual adaptation to causal launching, has now been demonstrated by at least three separate groups and seems to be a robust phenomenon. Adaptation is a strong indicator of a visual process that is tuned to a specific feature of the environment, in this case launching interactions. Whereas other studies have focused on retinotopically-specific adaptation (i.e., whether the adaptation effect is restricted to the same test location on the retina as the adaptation stream was presented to), this one focuses on feature-specificity.

      The first experiment replicates the adaptation effect for launching events as well as the lack of adaptation event for a minimally different non-causal 'slip' event. However, it also finds that the adaptation effect does not work for launching events that do not have a direction of motion more than 30 degrees from the direction of the test event. The interpretation is that the system that is being adapted is sensitive to the direction of this event, which is an interesting and somewhat puzzling result given the methods used in previous studies, which have used random directions of motion for both adaptation and test events.

      The obvious interpretation would be that past studies have simply adapted to launching in every direction, but that in itself says something about the nature of this direction-specificity: it is not working through opposed detectors. For example, in something like the waterfall illusion adaptation effect, where extended exposure to downward motion leads to illusory upward motion on neutral-motion stimuli, the effect simply doesn't work if motion in two opposed directions are shown (i.e., you don't see illusory motion in both directions, you just see nothing). The fact that adaptation to launching in multiple directions doesn't seem to cancel out the adaptation effect in past work raises interesting questions about how directionality is being coded in the underlying process. In addition, one limitation of the current method is that it's not clear whether the motion-direction-specificity is also itself retinotopically-specific, that is, if one retinotopic location were adapted to launching in one direction and a different retinotopic location adapted to launching in the opposite direction, would each test location show the adaptation effect only for events in the direction presented at that location?

      The second experiment tests whether the adaptation effect is similarly sensitive to differences in speed. The short answer is no; adaptation events at one speed affect test events at another. Furthermore, this is not surprising given that Kominsky & Scholl (2020) showed adaptation transfer between events with differences in speeds of the individual objects in the event (whereas all events in this experiment used symmetrical speeds). This experiment is still novel and it establishes that the speed-insensitivity of these adaptation effects is fairly general, but I would certainly have been surprised if it had turned out any other way.

      The third experiment tests color (as a marker of object identity), and pits it against motion direction. The results demonstrate that adaptation to red-launching-green generates an adaptation effect for green-launching-red, provided they are moving in roughly the same direction, which provides a nice internal replication of Experiment 1 in addition to showing that the adaptation effect is not sensitive to object identity. This result forms an interesting contrast with the infant causal perception literature. Multiple papers (starting with Leslie & Keeble, 1987) have found that 6-8-month-old infants are sensitive to reversals in causal roles exactly like the ones used in this experiment. The success of adaptation transfer suggests, very clearly, that this sensitivity is not based only on perceptual processing, or at least not on the same processing that we access with this adaptation procedure. It implies that infants may be going beyond the underlying perceptual processes and inferring genuine causal content. This is also not the first time the adaptation paradigm has diverged from infant findings: Kominsky & Scholl (2020) found a divergence with the object speed differences as well, as infants categorize these events based on whether the speed ratio (agent:patient) is physically plausible (Kominsky et al., 2017), while the adaptation effect transfers from physically implausible events to physically plausible ones. This only goes to show that these adaptation effects don't exhaustively capture the mechanisms of early-emerging causal event representation.

      One overarching point about the analyses to take into consideration: The authors use a Bayesian psychometric curve-fitting approach to estimate a point of subjective equality (PSE) in different blocks for each individual participant based on a model with strong priors about the shape of the function and its asymptotic endpoints, and this PSE is the primary DV across all of the studies. As discussed in Kominsky & Scholl (2020), this approach has certain limitations, notably that it can generate nonsensical PSEs when confronted with relatively extreme response patterns. The authors mentioned that this happened once in Experiment 3, and that participant had to be replaced. An alternate approach is simply to measure the proportion of 'pass' reports overall to determine if there is an adaptation effect. The results here do not change based on which analytical strategy is used, which ultimately just goes to show that the effects are very robust.

      In general, this paper adds further evidence for something like a 'launching' detector in the visual system, but beyond that it specifies some interesting questions for future work about how exactly such a detector might function.

      Kominsky, J. F., & Scholl, B. J. (2020). Retinotopic adaptation reveals distinct categories of causal perception. Cognition, 203, 104339. https://doi.org/10.1016/j.cognition.2020.104339

      Kominsky, J. F., Strickland, B., Wertz, A. E., Elsner, C., Wynn, K., & Keil, F. C. (2017). Categories and Constraints in Causal Perception. Psychological Science, 28(11), 1649-1662. https://doi.org/10.1177/0956797617719930

      Leslie, A. M., & Keeble, S. (1987). Do six-month-old infants perceive causality? Cognition, 25(3), 265-288. https://doi.org/10.1016/S0010-0277(87)80006-9

    4. Reviewer #3 (Public review):

      Summary:

      This paper presents evidence from three behavioral experiments that causal impressions of "launching events", in which one object is perceived to cause another object to move, depend on motion direction-selective processing. Specifically, the work uses an adaptation paradigm (Rolfs et al., 2013), presenting repetitive patterns of events matching certain features to a single retinal location, then measuring subsequent perceptual reports of a test display in which the degree of overlap between two discs was varied, and participants could respond "launch" or "pass". The three experiments report results of adapting to motion direction, motion speed and "object identity", and examine how the psychometric curves for causal reports shift in these conditions depending on the similarity of adapter and test. While causality reports in the test display were selective for motion direction (Experiment 1), they were not selective for adapter-test speed differences (Experiment 2) nor for changes in object identity induced via color swap (Experiment 3). These results support the notion of a biological implementation of causality perception in the visual system, possibly even independently of computations of object identity.

      Strengths:

      The setup of the research question and hypotheses are exceptional. The authors thoroughly discuss relevant literature to clearly link their launch/pass paradigm to impressions of causality, strengthening their hypothesis and conclusions. The experiments are carefully performed (appropriate equipment, careful control of eye movements). The slip adaptor is a really nice control condition and effectively mitigates the need to control for motion direction with a drifting grating or similar. Participants were measured with sufficient precision, and a power curve analysis was conducted to determine the sample size. Data analysis and statistical quantification is appropriate. Data and analysis code will be shared on publication, in keeping with open science principles. The paper is concise and well written.

      Weaknesses:

      I would like to emphasise that in the employed paradigm and previously conducted similar study, the only report options are "launch" or "pass". As pointed out by the authors' reply, the adaptation to launches seems to be a highly specific process and likely is a consequence of the causal interaction between the objects. I would nonetheless be interested to see which of the stimulus features driving the adaptation effect observed here are relevant/irrelevant to subjective causal impressions in an experiment.

      References:

      Rolfs, M., Dambacher, M., & Cavanagh, P. (2013). Visual Adaptation of the Perception of Causality. Current Biology, 23(3), 250-254. https://doi.org/10.1016/j.cub.2012.12.017

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors investigated causal inference in the visual domain through a set of carefully designed experiments, and sound statistical analysis. They suggest the early visual system has a crucial contribution to computations supporting causal inference. 

      Strengths: 

      I believe the authors target an important problem (causal inference) with carefully chosen tools and methods. Their analysis rightly implies the specialization of visual routines for causal inference and the crucial contribution of early visual systems to perform this computation. I believe this is a novel contribution and their data and analysis are in the right direction. 

      Weaknesses: 

      In my humble opinion, a few aspects deserve more attention: 

      (1) Causal inference (or causal detection) in the brain should be quite fundamental and quite important for human cognition/perception. Thus, the underlying computation and neural substrate might not be limited to the visual system (I don't mean the authors did claim that). In fact, to the best of my knowledge, multisensory integration is one of the best-studied perceptual phenomena that has been conceptualized as a causal inference problem.

      Assuming the causal inference in those studies (Shams 2012; Shams and Beierholm 2022;

      Kording et al. 2007; Aller and Noppeney 2018; Cao et al. 2019) (and many more e.g., by Shams and colleagues), and the current study might share some attributes, one expects some findings in those domains are transferable (at least to some degree) here as well. Most importantly, underlying neural correlates that have been suggested based on animal studies and invasive recording that has been already studied, might be relevant here as well.

      Perhaps the most relevant one is the recent work from the Harris group on mice (Coen et al. 2021). I should emphasize, that I don't claim they are necessarily relevant, but they can be relevant given their common roots in the problem of causal inference in the brain. This is a critical topic that the authors may want to discuss in their manuscript. 

      We thank the reviewer. We addressed this point of the public review in our reply to the reviewer’s suggestions (and add it here again for convenience). The literature on the role of occipital, parietal and frontal brain areas in causal inference is also addressed in the response to point 3 of the public review.

      “We used visual adaptation to carve out a bottom-up visual routine for detecting causal interactions in form of launching events. However, we know that more complex behaviors of perceiving causal relations can result from integrating information across space (e.g., in causal capture; Scholl & Nakayama, 2002), across time (postdictive influence; Choi & Scholl, 2006), and across sensory modalities (Sekuler, Sekuler, & Lau, 1997). Bayesian causal inference has been particularly successful as a normative framework to account for multisensory integration (Körding et al., 2007; Shams & Beierholm, 2022). In that framework, the evidence for a common-cause hypothesis is competing with the evidence for an independent-causes hypothesis (Shams & Beierholm, 2022). The task in our experiments could be similarly formulated as two competing hypotheses for the second disc’s movement (i.e., the movement was caused by the first disc vs. the movement occurred autonomously). This framework also emphasizes the distributed nature of the neural implementation for solving such inferences, showing the contributions of parietal and frontal areas in addition to sensory processing (for review see Shams & Beierholm, 2022). Moreover, even visual adaptation to contrast in mouse primary visual cortex is influenced by top-down factors such as behavioral relevance— suggesting a complex implementation of the observed adaptation results (Keller et al. 2017). The present experiments, however, presented purely visual events that do not require an integration across processing domains. Thus, the outcome of our suggested visual routine can provide initial evidence from within the visual system for a causal relation in the environment that may then be integrated with signals from other domains (e.g., auditory signals). Determining exactly how the perception of causality relates to mechanisms of causal inference and the neural implementation thereof is an exciting avenue for future research. Note, however, that perceived causality can be distinguished from judged causality: Even when participants are aware that a third variable (e.g., a color change) is the best predictor of the movement of the second disc in launching events, they still perceive the first disc as causing the movement of the second disc (Schlottmann & Shanks, 1992).”

      (2) If I understood correctly, the authors are arguing pro a mere bottom-up contribution of early sensory areas for causal inference (for instance, when they wrote "the specialization of visual routines for the perception of causality at the level of individual motion directions raises the possibility that this function is located surprisingly early in the visual system *as opposed to a higher-level visual computation*."). Certainly, as the authors suggested, early sensory areas have a crucial contribution, however, it may not be limited to that. Recent studies progressively suggest perception as an active process that also weighs in strongly, the topdown cognitive contributions. For instance, the most simple cases of perception have been conceptualized along this line (Martin, Solms, and Sterzer 2021) and even some visual illusion (Safavi and Dayan 2022), and other extensions (Kay et al. 2023). Thus, I believe it would be helpful to extend the discussion on the top-down and cognitive contributions of causal inference (of course that can also be hinted at, based on recent developments). Even adaptation, which is central in this study can be influenced by top-down factors (Keller et al. 2017). I believe, based on other work of Rolfs and colleagues, this is also aligned with their overall perspective on vision.  

      Indeed, we assessed bottom-up contributions to the perception of a causal relation. We agree with the reviewer that in more complex situations, for instance, in the presence of contextual influences or additional auditory signals, the perception of a causal relation may not be limited to bottom-up vision. While we had acknowledged this in the original manuscript (see excerpts below), we now make it even more explicit:

      “[…] we know that more complex behaviors of perceiving causal relations can result from integrating information across space (e.g., in causal capture; Scholl & Nakayama, 2002), across time (postdictive influence; Choi & Scholl, 2006), and across sensory modalities (Sekuler, Sekuler, & Lau, 1997).”

      “[…] Neurophysiological studies support the view of distributed neural processing underlying sensory causal interactions with the visual system playing a major role.”

      “[…] Interestingly, single cell recordings in area F5 of the primate brain revealed that motor areas are contributing to the perception of causality (Caggiano et al., 2016; Rolfs, 2016), emphasizing the distributed nature of the computations underlying causal interactions. This finding also stresses that the detection, and the prediction, of causality is essential for processes outside sensory systems (e.g., for understanding other’s actions, for navigating, and for avoiding collisions). The neurophysiology subserving causal inference further extend the candidate cortical areas that might contibute to the detection of causal relations, emphasizing the role of the frontal cortex for the flexible integration of multisensory representations (Cao et al., 2019; Coen et al., 2023).”

      However, there is also ample evidence that the perception of a simple causal relation—as we studied it in our experiments—escapes top-down cognitive influences. The perception of causality in launching events is described as automatic and irresistible, meaning that participants have the spontaneous impression of a causal relation, and participants typically do not voluntarily switch between a causal and a noncausal percept. This irresistibility has led several authors to discuss a modular organization underlying the detection of such events (Michotte, 1963; Scholl & Tremoulet, 2000). This view is further supported by a study that experimentally manipulated the contingencies between the movement of the two discs (Schlottmann & Shanks, 1992). In one condition the authors created a launching event where the second disc’s movement was perfectly correlated with a color change, but only sometimes coincided with the first disc’s movement offset. Nevertheless, participants reported seeing that the first disc caused the movement of second disc (regardless of the stronger statistical relationship with the color change). However, when asked to make conscious causal judgments, participants were aware of the color change as the true cause of the second disc’s motion—therefore recognizing its more reliable correlation. This study strongly suggests that perceived and judged causality (i.e., cognitive causal inference) can be dissociated (Schlottmann & Shanks, 1992). We have added this reference in the revised manuscript. Overall, we argue that our study focused on a visual routine that could be implemented in a simple bottom-up fashion, but we acknowledge throughout the manuscript, that in a more complex situation (e.g., integrating information from other sensory domains) the implementation could be realized in a more distributed fashion including top-down influences as in multisensory integration. However, it is important to stress that these potential top-down influences would be automatic and should not be confused with voluntary cognitive influences.

      “Note, however, that perceived causality can be distinguished from judged causality (Schlottmann & Shanks, 1992). Even when participants are aware that a third variable (e.g., a color change) is the best predictor of the movement of the second disc in launching events, they still perceive the first disc as causing the movement of the second disc (Schlottmann & Shanks, 1992).”

      (3) The authors rightly implicate the neural substrate of causal inference in the early sensory system. Given their study is pure psychophysics, a more elaborate discussion based on other studies that used brain measurements is needed (in my opinion) to put into perspective this conclusion. In particular, as I mentioned in the first point, the authors mainly discuss the potential neural substrate of early vision, however much has been done about the role of higher-tier cortical areas in causal inference e.g., see (Cao et al. 2019; Coen et al. 2021). 

      In the revised manuscript, we addressed the limitations of a purely psychophysical approach and acknowledged alternative implementations in the Discussion section.

      “Note that, while the present findings demonstrate direction-selectivity, it remains unclear where exactly that visual routine is located. As pointed out, it is also possible that the visual routine is located higher up in the visual system (or distributed across multiple levels) and is only using a directional-selective population response as input.”

      Moreover, we cite also the two suggested papers when referring to the role of cortical areas in causal inference (Cao et al, 2019; Coen et al., 2023):

      “Neurophysiological studies support the view of distributed neural processing underlying sensory causal interactions with the visual system playing a major role. Imaging studies in particular revealed a network for the perception of causality that is also involved in action observation (Blakemore et al., 2003; Fonlupt, 2003; Fugelsang et al., 2005; Roser et al., 2005). The fact that visual adaptation of causality occurs in a retinotopic reference frame emphazises the role of retinotopically organized areas within that network (e.g., V5 and the superior temporal sulcus). Interestingly, single cell recordings in area F5 of the primate brain revealed that motor areas are contributing to the perception of causality (Caggiano et al., 2016; Rolfs, 2016), emphasizing the distributed nature of the computations underlying causal interactions, and also stressing that the detection, and the prediction, of causality is essential for processes outside purely sensory systems (e.g., for understanding other’s actions, for navigating, and for avoiding collisions). The neurophysiological underpinnings in causal inference further extend the candidate cortical areas that might contibute to the detection of causal relations, emphasizing the role of the frontal cortex for the flexible integration of multisensory representations (Cao et al., 2019; Coen et al., 2023).”

      There were many areas in this manuscript that I liked: clever questions, experimental design, and statistical analysis.

      Thank you so much.

      Reviewer #1 (Recommendations for the authors):

      I congratulate the authors again on their manuscript and hope they will find my review helpful. Most of my notes are suggestions to the authors, and I hope will help them to improve the manuscript. None are intended to devalue their (interesting) work. 

      We would like to thank the reviewer for their thoughtful and encouraging comments.

      In the following, I use pX-lY template to refer to a particular page number, say page number X (pX), and line number, say line number Y (lY). 

      Major concerns and suggestions 

      - I would suggest simplifying the abstract and significance statement or putting more background in it. It's hard (at least for me) to understand if one is not familiar with the task used in this study. 

      We followed the reviewer’s suggestion and added more background in the beginning of the abstract. 

      We made the following changes:

      “Detecting causal relations structures our perception of events in the world. Here, we determined for visual interactions whether generalized (i.e., feature-invariant) or specialized (i.e., feature-selective) visual routines underlie the perception of causality. To this end, we applied a visual adaptation protocol to assess the adaptability of specific features in classical launching events of simple geometric shapes. We asked observers to report whether they observed a launch or a pass in ambiguous test events (i.e., the overlap between two discs varied from trial to trial). After prolonged exposure to causal launch events (the adaptor) defined by a particular set of features (i.e., a particular motion direction, motion speed, or feature conjunction), observers were less likely to see causal launches in subsequent ambiguous test events than before adaptation. Crucially, adaptation was contingent on the causal impression in launches as demonstrated by a lack of adaptation in non-causal control events. We assessed whether this negative aftereffect transfers to test events with a new set of feature values that were not presented during adaptation. Processing in specialized (as opposed to generalized) visual routines predicts that the transfer of visual adaptation depends on the feature-similarity of the adaptor and the test event. We show that negative aftereffects do not transfer to unadapted launch directions but do transfer to launch events of different speed. Finally, we used colored discs to assign distinct feature-based identities to the launching and the launched stimulus. We found that the adaptation transferred across colors if the test event had the same motion direction as the adaptor. In summary, visual adaptation allowed us to carve out a visual feature space underlying the perception of causality and revealed specialized visual routines that are tuned to a launch’s motion direction.”

      - The authors highlight the importance of studying causal inference and understanding the underlying mechanisms by probing adaptation, however, their introduction justifying that is, in my humble opinion, quite short. Perhaps in the cited paper, this is discussed extensively, but I'd suggest providing some elaboration in the manuscript. Otherwise, the study would be very specific to certain visual phenomena, rather than general mechanisms.  

      We have carefully considered the reviewer’s set of comments and concerns (e.g., the role of top-down influences, the contributions of the frontal cortex, and illustration of the computational level). They all appear to share the theme that the reviewer looks at our study from the perspective of Bayesian inference. We conducted the current study in the tradition of classical phenomena in the field of the perception of causality (in the tradition of Michotte, 1963 and as reviewed in Scholl & Tremoulet, 2000) which aims to uncover the relevant visual parameters and rules for detecting causal relations in the visual domain. Indeed, we think that a causal inference perspective promises a lot of new insights into the mechanisms underlying the classical phenomena described for the perception of causality. In the revised manuscript, we discuss therefore causal inference and how it relates to the current study. We now emphasize that in our study, a) we used visual adaptation to reveal the bottom-up processes that allow for the detection of a causal interaction in the visual domain, b) that the perception of causality also integrates signals from other domains (which we do not study here), and c) that the neural substrates underlying the perception of causality might be best described by a distributed network. By discussing Bayesian causal inference, we point out promising avenues for future research that may bridge the fields of the perception of causality and Bayesian causal inference. However, we also emphasize that perceived causality and judged causality can be dissociated (Schlottmann & Shanks, 1992).

      We added the following discussion:

      “We used visual adaptation to carve out a bottom-up visual routine for detecting causal interactions in form of launching events. However, we know that more complex behaviors of perceiving causal relations can result from integrating information across space (e.g., in causal capture; Scholl & Nakayama, 2002), across time (postdictive influence; Choi & Scholl, 2006), and across sensory modalities (Sekuler, Sekuler, & Lau, 1997). Bayesian causal inference has been particularly successful as a normative framework to account for multisensory integration (Körding et al., 2007; Shams & Beierholm, 2022). In that framework, the evidence for a common-cause hypothesis is competing with the evidence for an independent-causes hypothesis (Shams & Beierholm, 2022). The task in our experiments could be similarly formulated as two competing hypotheses for the second disc’s movement (i.e., the movement was caused by the first disc vs. the second disc did not move). This framework also emphasizes the distributed nature of the neural implementation for solving such inferences, showing the contributions of parietal and frontal areas in addition to sensory processing (for review see Shams & Beierholm, 2022). Moreover, even visual adaptation to contrast in mouse primary visual cortex is influenced by top-down factors such as behavioral relevance— suggesting a complex implementation of the observed adaptation results (Keller et al. 2017). The present experiments, however, presented purely visual events that do not require an integration across processing domains. Thus, the outcome of our suggested visual routine can provide initial evidence from within the visual system for a causal relation in the environment that may then be integrated with signals from other domains (e.g., auditory signals). Determining exactly how the perception of causality relates to mechanisms of causal inference and the neural implementation thereof is an exciting avenue for future research. Note, however, that perceived causality can be distinguished from judged causality: Even when participants are aware that a third variable (e.g., a color change) is the best predictor of the movement of the second disc in launching events, they still perceive the first disc as causing the movement of the second disc (Schlottmann & Shanks, 1992).”

      - I'd suggest, at the outset, already set the context, that your study of causal inference in the brain is specifically targeting the visual domain, if you like, in the discussion connect it  better to general ideas about causal inference in the brain (like the works by Ladan Shams and colleagues). 

      We would like to thank the reviewer for this comment. We followed the reviewer’s suggestion and made clear from the beginning that this paper is about the detection of causal relations in the visual domain. In the revised manuscript we write:

      “Here, we will study the mechanisms underlying the computations of causal interactions in the visual domain by capitalizing on visual adaptation of causality (Kominsky & Scholl, 2020; Rolfs et al., 2013). Adaptation is a powerful behavioral tool for discovering and dissecting a visual mechanism (Kohn, 2007; Webster, 2015) that provides an intriguing testing ground for the perceptual roots of causality.”

      As described in our reply to the previous comment, we now also discussed the ideas about causal inference.

      - To better illustrate the implication of your study on the computational level, I'd suggest putting it in the context of recent approaches to perception (point 2 of my public review). I think this is also aligned with the comment of Reviewer#3 on your line 32 (recommendation for authors).  

      In the revised manuscript, we now discuss the role of top-down influences in causal inference when addressing point 2 of the reviewer’s public review.

      Minor concerns and suggestions 

      - On p2-l3, I'd suggest providing a few examples for generalized and or specialized visual routines (given the importance of the abstract). I only got it halfway through the introduction. 

      We thank the reviewer for highlighting the need to better introduce the concept of a visual routine. We have chosen the term visual routine to emphasize that we locate the part of the mechanism that is affected by the adaptation in our experiments in the visual system. At the same time, the concept leaves space with respect to the extent to which the mechanism further involves mid- and higher-level processes. In the revised manuscript, we now refer to Ullman (1987) who introduced the concept of a visual routine—the idea of a modular operation that sequentially processes spatial and feature information. Moreover, we refer to the concept of attentional sprites (Cavanagh, Labianca, & Thornton, 2001)—attention-based visual routines that allow the visual system to semi-independently handle complex visual tasks (e.g., identifying biological motion).

      We add the following footnote to the introduction:

      “We use the term visual routine here to highlight that our adaptation experiments can reveal a causality detection mechanism that resides in the visual system. At the same time, calling it a routine emphasizes similarities with a local, semi-independent operation (e.g., the recognition of familiar motion patterns; see also Ullman, 1987; Cavanagh, Labianca, & Thornton, 2001) that can engage mid- and higher-level processes (e.g., during causal capture, Scholl & Nakayama, 2002; or multisensory integration, Körding et al., 2007).”

      In the abstract we now write:

      “Here, we determined for visual interactions whether generalized (i.e., feature-invariant) or specialized (i.e., feature-selective) visual routines underlie the perception of causality.”

      - On p4-l31, I'd suggest mentioning the Matlab version. I have experienced differences across different versions of Matlab (minor but still ...). 

      We added the Matlab Version.

      - On p6-l46 OSF-link is missing (that contains data and code). 

      Thank you. We made the OSF repository public and added the link to the revised manuscript.

      We added the following information to the revised manuscript.

      “The data analysis code has been deposited at the Open Science Framework and is publicly available https://osf.io/x947m/.”

      Reviewer #2 (Public Review):

      This paper seeks to determine whether the human visual system's sensitivity to causal interactions is tuned to specific parameters of a causal launching event, using visual adaptation methods. The three parameters the authors investigate in this paper are the direction of motion in the event, the speed of the objects in the event, and the surface features or identity of the objects in the event (in particular, having two objects of different colors). The key method, visual adaptation to causal launching, has now been demonstrated by at least three separate groups and seems to be a robust phenomenon. Adaptation is a strong indicator of a visual process that is tuned to a specific feature of the environment, in this case launching interactions. Whereas other studies have focused on retinotopically specific adaptation (i.e., whether the adaptation effect is restricted to the same test location on the retina as the adaptation stream was presented to), this one focuses on feature specificity. 

      The first experiment replicates the adaptation effect for launching events as well as the lack of adaptation event for a minimally different non-causal 'slip' event. However, it also finds that the adaptation effect does not work for launching events that do not have a direction of motion more than 30 degrees from the direction of the test event. The interpretation is that the system that is being adapted is sensitive to the direction of this event, which is an interesting and somewhat puzzling result given the methods used in previous studies, which have used random directions of motion for both adaptation and test events. 

      The obvious interpretation would be that past studies have simply adapted to launching in every direction, but that in itself says something about the nature of this direction-specificity: it is not working through opposed detectors. For example, in something like the waterfall illusion adaptation effect, where extended exposure to downward motion leads to illusory upward motion on neutral-motion stimuli, the effect simply doesn't work if motion in two opposed directions is shown (i.e., you don't see illusory motion in both directions, you just see nothing). The fact that adaptation to launching in multiple directions doesn't seem to cancel out the adaptation effect in past work raises interesting questions about how directionality is being coded in the underlying process. 

      We would like to thank the reviewer for that thoughtful comment. We added the described implication to the manuscript:

      “While the present study demonstrates direction-selectivity for the detection of launches, previous adaptation protocols demonstrated successful adaptation using adaptors with random motion direction (Rolfs et al., 2013; Kominsky & Scholl, 2020). These results therefore suggest independent direction-specific routines, in which adaptation to launches in one direction does not counteract an adaptation to launches in the opposite direction (as for example in opponent color coding).”

      In addition, one limitation of the current method is that it's not clear whether the motion direction-specificity is also itself retinotopically-specific, that is, if one retinotopic location were adapted to launching in one direction and a different retinotopic location adapted to launching in the opposite direction, would each test location show the adaptation effect only for events in the direction presented at that location? 

      This is an interesting idea! Because previous adaptation studies consistently showed retinotopic adaptation of causality, we would not expect to find transfer of directional tuning for launches to other locations. We agree that the suggested experiment on testing the reference frame of directional specificity constitutes an interesting future test of our findings.

      The second experiment tests whether the adaptation effect is similarly sensitive to differences in speed. The short answer is no; adaptation events at one speed affect test events at another. Furthermore, this is not surprising given that Kominsky & Scholl (2020) showed adaptation transfer between events with differences in speeds of the individual objects in the event (whereas all events in this experiment used symmetrical speeds). This experiment is still novel and it establishes that the speed-insensitivity of these adaptation effects is fairly general, but I would certainly have been surprised if it had turned out any other way. 

      We thank the reviewer for highlighting the link to an experiment reported in Kominsky & Scholl (2020). We report the finding of that experiment now in the revised manuscript.

      We added the following paragraph in the discussion:

      “For instance, we demonstrated a transfer of adaptation across speed for symmetrical speed ratios. This result complements a previous finding that reported that the adaptation to triggering events (with an asymmetric speed ratio of 1:3) resulted in significant retinotopic adaptation of ambiguous (launching) test events of different speed ratios (i.e., test events with a speed ratio of 1:1 and of 1:3; Kominsky & Scholl, 2020).”

      The third experiment tests color (as a marker of object identity), and pits it against motion direction. The results demonstrate that adaptation to red-launching-green generates an adaptation effect for green-launching-red, provided they are moving in roughly the same direction, which provides a nice internal replication of Experiment 1 in addition to showing that the adaptation effect is not sensitive to object identity. This result forms an interesting contrast with the infant causal perception literature. Multiple papers (starting with Leslie & Keeble, 1987) have found that 6-8-month-old infants are sensitive to reversals in causal roles exactly like the ones used in this experiment. The success of adaptation transfer suggests, very clearly, that this sensitivity is not based only on perceptual processing, or at least not on the same processing that we access with this adaptation procedure. It implies that infants may be going beyond the underlying perceptual processes and inferring genuine causal content. This is also not the first time the adaptation paradigm has diverged from infant findings: Kominsky & Scholl (2020) found a divergence with the object speed differences as well, as infants categorize these events based on whether the speed ratio (agent:patient) is physically plausible (Kominsky et al., 2017), while the adaptation effect transfers from physically implausible events to physically plausible ones. This only goes to show that these adaptation effects don't exhaustively capture the mechanisms of early-emerging causal event representation. 

      We would like to thank the reviewer for highlighting the similarities (and differences) to the seminal study by Leslie and Keeble (1987). We included a discussion with respect to that paper in the revised manuscript. Indeed, that study showed a recovery from habituation to launches after reversal of the launching events. In their study, the reversal condition resulted in a change of two aspects, 1) motion direction and 2) a change of what color is linked to either cause (i.e., agent) or effect (i.e, patient). Our study, based on visual adaptation in adults, suggests that switching the two colors is not necessary for a recovery from the habituation, provided the motion direction is reversed. Importantly, the reversal of the motion direction only affected the perception of causality after adapting to launches (but not to slip events), which is consistent with Leslie and Keeble’s (1987) finding that the effect of a reversal is contingent on habituation/adaptation to a causal relationship (and is not observed for non-causal delayed launches). Based on our findings, we predict that switching colors without changing the event’s motion direction would not result in a recovery from habituation. Obviously, for infants, color may play a more important role for establishing an object identity than it does for adults, which could explain potential differences. We also agree with the reviewer’s point that the adaptation protocol might tap into different mechanisms than revealed by habituation studies in infants (e.g, Kominsky et al., 2017 vs. Kominsky & Scholl, 2020). 

      We revised the manuscript accordingly when discussing the role of direction selectivity in our study:

      “Habituation studies in six-months-old infants also demonstrated that the reversal of a launch resulted in a recovery from habituation to launches (while a non-causal control condition of delayed-launches did not; Leslie & Keeble, 1987). In their study, the reversal of motion direction was accompanied by a reversal of the color assignment to the cause-effectrelationship. In contrast, our findings suggest, that in adults color does not play a major role in the detection of a launch. Future studies should further delineate similarities and differences obtained from adaptation studies in adults and habituation studies in children (e.g., Kominsky et al., 2017; Kominsky & Scholl, 2020).”

      One overarching point about the analyses to take into consideration: The authors use a Bayesian psychometric curve-fitting approach to estimate a point of subjective equality (PSE) in different blocks for each individual participant based on a model with strong priors about the shape of the function and its asymptotic endpoints, and this PSE is the primary DV across all of the studies. As discussed in Kominsky & Scholl (2020), this approach has certain limitations, notably that it can generate nonsensical PSEs when confronted with relatively extreme response patterns. The authors mentioned that this happened once in Experiment 3 and that a participant had to be replaced. An alternate approach is simply to measure the proportion of 'pass' reports overall to determine if there is an adaptation effect. I don't think this alternate analysis strategy would greatly change the results of this particular experiment, but it is robust against this kind of self-selection for effects that fit in the bounds specified by the model, and may therefore be worth including in a supplemental section or as part of the repository to better capture the individual variability in this effect. 

      We largely agree with these points. Indeed, we adopted the non-parametric analysis for a recent series of experiments in which the psychometric curves were more variable (Ohl & Rolfs, Vision Sciences Society Meeting 2024). In the present study, however, the model fits were very convincing. In Figures S1, S2 and S3 we show the model fits for each individual observer and condition on top of the mean proportion of launch reports. The inferential statistics based on the points of subjective equality, therefore, allowed us to report our findings very concisely.

      In general, this paper adds further evidence for something like a 'launching' detector in the visual system, but beyond that, it specifies some interesting questions for future work about how exactly such a detector might function. 

      We thank the reviewer for this positive overall assessment.

      Reviewer #2 (Recommendations for the authors):

      Generally, the paper is great. The questions I raised in the public review don't need to be answered at this time, but they're exciting directions for future work. 

      We would like to thank the reviewer for the encouraging comments and thoughtful ideas on how to improve the manuscript.

      I would have liked to see a little more description of the model parameters in the text of the paper itself just so readers know what assumptions are going into the PSE estimation. 

      We followed the reviewer’s suggestion and added more information regarding the parameter space (i.e., ranges of possible parameters of the logistic model) that we used for obtaining the model fits. 

      Specifically, we added the following information in the manuscript:

      “For model fitting, we constrained the range of possible estimates for each parameter of the logistic model. The lower asymptote for the proportion of reported launches was constrained to be in the range 0–0.75, and the upper asymptote in the range 0.25–1. The intercept of the logistic model was constrained to be in the range 1–15, and the slope was constrained to be in the range –20 to –1.”

      The models provided very good fits as can be appreciated by the fits per individual and experimental condition which we provide in response to the public comments. Please note, that all data and analysis scripts are available at the Open Science Framework (https://osf.io/x947m/).

      I also have a recommendation about Figure 1b: Color-code "Feature A", "Feature B", and "Feature C" and match those colors with the object identity/speed/direction text. I get what the figure is trying to convey but to a naive reader there's a lot going on and it's hard to interpret. 

      We followed the reviewer’s suggestion and revised the visualization accordingly.

      If you have space, figures showing the adaptation and corresponding test events for each experimental manipulation would also be great, particularly since the naming scheme of the conditions is (necessarily) not entirely consistent across experiments. It would be a lot of little figures, I know, but to people who haven't spent as long staring at these displays as we have, they're hard to envision based on description alone. 

      We followed the reviewer’s recommendation and added a visualization of the adaptor and the test events for the different experiments in Figure 2.

      Reviewer #3 (Public Review):

      We thank the reviewer for their thoughtful comments, which we carefully addressed to improve the revised manuscript. 

      Summary: 

      This paper presents evidence from three behavioral experiments that causal impressions of "launching events", in which one object is perceived to cause another object to move, depending on motion direction-selective processing. Specifically, the work uses an adaptation paradigm (Rolfs et al., 2013), presenting repetitive patterns of events matching certain features to a single retinal location, then measuring subsequent perceptual reports of a test display in which the degree of overlap between two discs was varied, and participants could respond "launch" or "pass". The three experiments report results of adapting to motion direction, motion speed, and "object identity", and examine how the psychometric curves for causal reports shift in these conditions depending on the similarity of the adapter and test. While causality reports in the test display were selective for motion direction (Experiment 1), they were not selective for adapter-test speed differences (Experiment 2) nor for changes in object identity induced via color swap (Experiment 3). These results support the notion that causal perception is computed (in part) at relatively early stages of sensory processing, possibly even independently of or prior to computations of object identity. 

      Strengths: 

      The setup of the research question and hypotheses is exceptional. The experiments are carefully performed (appropriate equipment, and careful control of eye movements). The slip adaptor is a really nice control condition and effectively mitigates the need to control motion direction with a drifting grating or similar. Participants were measured with sufficient precision, and a power curve analysis was conducted to determine the sample size. Data analysis and statistical quantification are appropriate. Data and analysis code are shared on publication, in keeping with open science principles. The paper is concise and well-written. 

      Weaknesses: 

      The biggest uncertainty I have in interpreting the results is the relationship between the task and the assumption that the results tell us about causality impressions. The experimental logic assumes that "pass" reports are always non-causal impressions and "launch" reports are always causal impressions. This logic is inherited from Rolfs et al (2013) and Kominsky & Scholl (2020), who assert rather than measure this. However, other evidence suggests that this assumption might not be solid (Bechlivanidis et al., 2019). Specifically, "[our experiments] reveal strong causal impressions upon first encounter with collision-like sequences that the literature typically labels "non-causal"" (Bechlivanidis et al., 2019) -- including a condition that is similar to the current "pass". It is therefore possible that participants' "pass" reports could also involve causal experiences. 

      We agree with the reviewer that our study assumes that the launch-pass dichotomy can be mapped onto a dimension of causal to non-causal impressions. Please note that the choice for this launch-pass task format was intentional. We consider it an advantage that subjects do not have to report causal vs non-causal impressions directly, as it allows us to avoid the oftencriticized decision biases that come with asking participants about their causal impression (Joynson, 1971; for a discussion see Choi & Scholl, 2006). This comes obviously at the cost that participants did not directly report their causal impression in our experiments. There is however evidence that increasing overlap between the discs monotonically decreases the causal impression when directly asking participants to report their causal impression (Scholl & Nakayama, 2004). We believe, therefore, that the assumption of mapping between launchesto-passes and causal-to-noncausal is well-justified. At the same time, the expressed concern emphasizes the need to develop further, possibly implicit measure for causal impressions (see Völter & Huber, 2021).

      However, as pointed out by the reviewer, a recent paper demonstrated that on first encounter participants can have impressions in response to a pass event that are different from clearly non-causal impressions (Bechlivanidis et al., 2019). As demonstrated in the same paper, displaying a canonical launch decreased the impression of causality when seeing pass events in subsequent trials. In our study, participants completed an entire training session before running the main experiments. It is therefore reasonable to expect that participants observed passes as non-causal events given the presence of clear causal references. Nevertheless, we now acknowledge this concern directly in the revised manuscript.

      We added the following paragraph to the discussion:

      “In our study, we assessed causal perception by asking observers to report whether they observed a launch or a pass in events of varying ambiguity. This method assumes that launches and passes can be mapped onto a dimension that ranges from causal to non-causal impressions. It has been questioned whether pass events are a natural representative of noncausal events: Observers often report high impressions of causality upon first exposure to pass events, which then decreased after seeing a canonical launch (Bechlivanidis, Schlottmann, & Lagnado, 2019). In our study, therefore, participants completed a separate session that included canonical launches before starting the main experiment.”

      Furthermore, since the only report options are "launch" or "pass", it is also possible that "launch" reports are not indications of "I experienced a causal event" but rather "I did not experience a pass event". It seems possible to me that different adaptation transfer effects (e.g. selectivity to motion direction, speed, or color-swapping) change the way that participants interpret the task, or the uncertainty of their impression. For example, it could be that adaptation increases the likelihood of experiencing a "pass" event in a direction-selective manner, without changing causal impressions. Increases of "pass" impressions (or at least, uncertainty around what was experienced) would produce a leftward shift in the PSE as reported in Experiment 1, but this does not necessarily mean that experiences of causal events changed. Thus, changes in the PSEs between the conditions in the different experiments may not directly reflect changes in causal impressions. I would like the authors to clarify the extent to which these concerns call their conclusions into question. 

      Indeed, PSE shifts are subject to cognitive influences and can even be voluntarily shifted (Morgan et al., 2012). We believe that decision biases (e.g., reporting the presence of launch before adaptation vs. reporting the absence of a pass after the adaptation) are unlikely to explain the high specificity of aftereffects observed in the current study. While such aftereffects are very typical of visual processing (Webster, 2015), it is unclear how a mechanism that increase the likelihood of perceiving a pass could account for the retinotopy of adaptation to launches (Rolfs et al., 2013) or the recently reported selective transfer of adaptation for only some causal categories (Kominsky et al., 2020). The latter authors revealed a transfer of adaptation from triggering to launching, but not from entraining events to launching. Based on these arguments, we decided to not include this point in the revised manuscript.

      Leaving these concerns aside, I am also left wondering about the functional significance of these specialised mechanisms. Why would direction matter but speed and object identity not? Surely object identity, in particular, should be relevant to real-world interpretations and inputs of these visual routines? Is color simply too weak an identity? 

      We agree that it would be beneficial to have mechanisms in place that are specific for certain object identities. Overall, our results fit very well to established claims that only spatiotemporal parameters mediate the perception of causality (Michotte, 1963; Leslie, 1984; Scholl & Tremoulet, 2000). We have now explicitly listed these references again in the revised manuscript. It is important to note, that an understanding of a causal relation could suffice to track identity information based purely on spatiotemporal contingencies, neglecting distinguishing surface features.

      We revised the manuscript and state:

      “Our findings therefore provide additional support for the claim that an event’s spatiotemporal parameters mediate the perception of causality (Michotte, 1963; Leslie, 1984; Scholl & Tremoulet, 2000).”

      Moreover, we think our findings of directional selectivity have functional relevance. First, direction-selective detection of collisions allows for an adaptation that occurs separately for each direction. That means that the visual system can calibrate these visual routines for detecting causal interactions in response to real-world statistics that reflect differences in directions. For instance, due to gravity, objects will simply fall to the ground. Causal relation such as launches are likely to be more frequent in horizontal directions, along a stable ground. Second, we think that causal visual events are action-relevant, that is, acting on (potentially) causal events promises an advantage (e.g., avoiding a collision, or quickly catching an object that has been pushed away). The faster we can detect such causal interactions, the faster we can react to them. Direction-selective motion signals are available in the first stages of visual processing. Visual routines that are based on these direction-selective motion signals promise to enable such fast computations. Please note, however, that while our present findings demonstrate direction-selectivity, they do not pinpoint where exactly that visual routine is located. It is quite possible that the visual routine is located higher up in the visual system, relying on a direction-selective population response as input.

      We added these points to the discussion of the functional relevance: 

      “We suggest that at least two functional benefits result from a specialized visual routine for detecting causality. First, a direction-selective detection of launches allows adaptation to occur separately for each direction. That means that the visual system can automatically calibrate the sensitivity of these visual routines in response to real-world statistics. For instance, while falling objects drop vertically towards the ground, causal relations such as launches are common in horizontal directions moving along a stable ground. Second, we think that causal visual events are action-relevant, and the faster we can detect such causal interactions, the faster we can react to them. Direction-selective motion signals are available very early on in the visual system. Visual routines that are based on these direction-selective motion signals may enable faster detection. While our present findings demonstrate direction-selectivity, they do not pinpoint where exactly that visual routine is located. It is possible that the visual routine is located higher up in the visual system (or distributed across multiple levels), relying on a direction-selective population response as input.”

      Reviewer #3 (Recommendations for the authors):

      - The concept of "visual routines" is used without introduction; for a general-interest audience it might be good to include a definition and reference(s) (e.g. Ullman.). 

      Thank you very much for highlighting that point. We have chosen the term visual routine to emphasize that we locate the part of the mechanism that is affected by the adaptation in our experiments in the visual system, but at the same time it leaves space regarding the extent to which the mechanism further involves mid- and higher-level processes. The term thus has a clear reference to a visual routine by Ullman (1987). We have now addressed what we mean by visual routine, and we also included the reference in the revised manuscript.

      We add the following footnote to the introduction:

      “We use the term visual routine here to highlight that our adaptation experiments can reveal a causality detection mechanism that resides in the visual system. At the same time, calling it a routine emphasizes similarities with a local, semi-independent operation (e.g., the recognition of familiar motion patterns; see also Ullman, 1987; Cavanagh, Labianca, & Thornton, 2001) that can engage mid- and higher-level processes (e.g., during causal capture, Scholl & Nakayama, 2002; or multisensory integration, Körding et al., 2007).”

      - I would appreciate slightly more description of the phenomenology of the WW adaptors: is this Michotte's "entraining" event? Does it look like one disc shunts the other?  

      The stimulus differs from Michotte's entrainment event in both spatiotemporal parameters and phenomenology. We added videos for the launch, pass and slip events as Supplementary Material.

      Moreover, we described the slip event in the methods section:

      “In two additional sessions, we presented slip events as adaptors to control that the adaptation was specific for the impression of causality in the launching events. Slip events are designed to match the launching events in as many physical properties as possible while producing a very different, non-causal phenomenology. In slip events, the first peripheral disc also moves towards a stationary disc. In contrast to launching events, however, the first disc passes the stationary disc and stops only when it is adjacent to the opposite edge of the stationary disc. While slip events do not elicit a causal impression, they have the same number of objects and motion onsets, the same motion direction and speed, as well as the same spatial area of the event as launches.”

      In the revised manuscript, we added also more information on the slip event in the beginning of the results section. Importantly, the stimulus typically produces the impression of two independent movements and thus serves as a non-causal control condition in our study. Only anecdotally, some observers (not involved in this study) who saw the stimulus spontaneously described their phenomenology of seeing a slip event as a double step or a discus throw.

      We added the following description to the results section:

      “Moreover, we compared the visual adaptation to launches to a (non-causal) control condition in which we presented slip events as adaptor. In a slip event, the initially moving disc passes completely over the stationary disc, stops immediately on the other side, and then the initially stationary disc begins to move in the same direction without delay. Thus, the two movements are presented consecutively without a temporal gap. This stimulus typically produces the impression of two independent (non-causal) movements.”

      - In general more illustrations of the different conditions (similar to Figure 1c but for the different experimental conditions and adaptors) might be helpful for skim readers.  

      We followed the reviewer’s recommendation and added a visualization of the adaptor and the test events for the different experiments in Figure 2.

      - Were the luminances of the red and green balls in experiment 3 matched? Were participants checked for color anomalous vision?  

      Yes, we checked for color anomalous vision using the color test Tafeln zur Prüfung des Farbensinnes/Farbensehens (Kuchenbecker & Broschmann, 2016). We added that information to the manuscript. The red and green discs were not matched for luminance. We measured the luminance after the experiment (21 cd/m<sup>2</sup> for the green disc and 6 cd/m<sup>2</sup> for the red disc). Please note, that the differences in luminance should not pose a problem for the interpretation of the results, as we see a transfer of the adaptation across the two different colors.

      We added the following information to the manuscript:

      “The red and green discs were not matched for luminance. Measurements obtained after the experiments yielded a luminance of 21 cd/m<sup>2</sup> for the green disc and 6 cd/m<sup>2</sup> for the red disc.”

      “All observers had normal or corrected-to-normal vision and color vision as assessed using the color test Tafeln zur Prüfung des Farbensinnes/Farbensehens (Kuchenbecker & Broschmann, 2016).”

      - Relationship of this work to the paper by Arnold et al., (2015). That paper suggested that some effects of adaptation of launching events could be explained by an adaptation of object shape, not by causality per se. It is superficially difficult to see how one could explain the present results from the perspective of object "squishiness" -- why would this be direction selective? In other words, the present results taken at face value call the "squishiness" explanation into question. The authors could consider an explanation to reconcile these findings in their discussion. 

      Indeed, the paper by Arnold and colleagues (2014) suggested that a contact-launch adaptor could lead to a squishiness aftereffect—arguing that the object elasticity changed in response to the adaptation.  Importantly, the same study found an object-centered adaptation effect rather than a retinotopic adaptation effect. However, the retinotopic nature of the negative aftereffect as used in our study has been repeatedly replicated (for instance Kominsky & Scholl, 2020). Thus, the divergent results of Arnold and colleagues may have resulted from differences in the task (i.e., observers had to judge whether they perceived a soft vs. hard bounce), or the stimuli (i.e., bounces of a disc and a wedge, and the discs moving on a circular trajectory). It would be important to replicate these results first and then determine whether their squishiness effect would be direction-selective as well. We now acknowledge the study by Arnold and colleagues in the discussion:

      “The adaptation of causality is spatially specific to the retinotopic coordinates of the adapting stimulus (Kominsky & Scholl, 2020; Rolfs et al., 2013; for an object-centered elasiticity aftereffect using a related stimulus on a circular motion path, see Arnold et al., 2015), suggesting that the detection of causal interactions is implemented locally in visual space.”

      - Line 32: "showing that a specialized visual routine for launching events exists even within separate motion direction channels". This doesn't necessarily mean the routine is within each separate direction channel, only that the output of the mechanism depends on the population response over motion direction. The critical motion computation could be quite high level -- e.g. global pattern motion in MST. Please clarify the claim. 

      We agree with the reviewer, that it is also possible that critical parts of the visual routine could simply use the aggregated population response over motion direction at higher-levels of processing. We acknowledge this possibility in the discussion of the functional relevance of the proposed mechanism and when suggesting that a distributed brain network may contribute to the perception of causality.

      We would like to highlight the following two revised paragraphs.

      “[…] Second, we think that causal visual events are action-relevant, and the faster we can detect such causal interactions, the faster we can react to them. Direction-selective motion signals are available very early on in the visual system. Visual routines that are based on these direction-selective motion signals may enable faster detection. While our present findings demonstrate direction-selectivity, they do not pinpoint where exactly that visual routine is located. It is possible that the visual routine is located higher up in the visual system (or distributed across multiple levels), relying on a direction-selective population response as input.”

      Moreover, when discussing the neurophysiological literature we write:

      “Interestingly, single cell recordings in area F5 of the primate brain revealed that motor areas are contributing to the perception of causality (Caggiano et al., 2016; Rolfs, 2016), emphasizing the distributed nature of the computations underlying causal interactions. This finding also stresses that the detection, and the prediction, of causality is essential for processes outside purely sensory systems (e.g., for understanding other’s actions, for navigating, and for avoiding collisions).”

      -  p. 10 line 30: typo "particual".  

      Done.

      -  p. 10 line 37: "This findings rules out (...)" should be singular "This finding rules out (...)". 

      Done.

      -  Spelling error throughout: "underly" should be "underlie". 

      Done.

      -  p.11 line 29: "emerges fast and automatic" should be "automatically". 

      Done.

    1. eLife Assessment

      Cichlid fishes have attracted attention from a wide range of biologists because of their<br /> extensive species diversification at the ecological and phenotypic levels. In this important study, the authors have partially revealed the mechanism behind lip thickening in cichlid fishes, which has evolved independently across three lakes in Africa. To explore this phenomenon, the authors used histological comparison, proteomics, and transcriptomics, all of which are well suited for their objectives. With compelling evidence, this contribution provides insights into parallel evolution in polygenic traits and holds significant value for the field.

    2. Reviewer #1 (Public review):

      Summary:

      Machii et al. reported a possible molecular mechanism underlying the parallel evolution of lip hypertrophy in African cichlids. The multifaceted approach taken in this manuscript is highly valued, as it uses histology, proteomics, and transcriptomics to reveal how phylogenetically distinct thick-lips have evolved in parallel. Findings from histology and proteomics connected to wnt signaling through the transcriptome are very exciting.

      Strengths:

      There is consistency between the results and it is possible to make a strong argument from the results.

      Comments on revised version:

      The issues I pointed out in the previous review have been carefully answered, and all issues have been addressed. The main points of the manuscript are clear, and the conclusions are easy to understand. The enlarged lips are a notable example of convergent evolution in African cichlids.

    1. eLife Assessment

      This important manuscript investigates the role of olfactory cues in Pieris brassicae larvae, focusing on their interactions with the host plant Brassica oleracea and the parasitoid wasp Cotesia glomerata. The authors' demonstration that impaired olfactory perception reduces caterpillar performance and increases susceptibility to parasitism is solid. These findings highlight the ecological significance of olfaction in mediating feeding behavior and predator avoidance in herbivorous insects.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript focuses on the olfactory system of Pieris brassicae larvae and the importance of olfactory information in their interactions with the host plant Brassica oleracea and the major parasitic wasp Cotesia glomerata. The authors used CRISPR/Cas9 to knockout odorant receptor co-receptors (Orco), and conducted a comparative study on the behavior and olfactory system of the mutant and wild-type larvae. The study found that Orco-expressing olfactory sensory neurons in antennae and maxillary palps of Orco knockout (KO) larvae disappeared, and the number of glomeruli in the brain decreased, which impairs the olfactory detection and primary processing in the brain. Orco KO caterpillars show weight loss and loss of preference for optimal food plants; KO larvae also lost weight when attacked by parasitoids with the ovipositor removed, and mortality increased when attacked by untreated parasitoids. On this basis, the authors further studied the responses of caterpillars to volatiles from plants attacked by the larvae of the same species and volatiles from plants on which the caterpillars were themselves attacked by parasitic wasps. Lack of OR-mediated olfactory inputs prevents caterpillars from finding suitable food sources and from choosing spaces free of enemies.

      Strengths:

      The findings help to understand the important role of olfaction in caterpillar feeding and predator avoidance, highlighting the importance of odorant receptor genes in shaping ecological interactions.

      Weaknesses:

      There are the following major concerns:

      (1) Possible non-targeted effects of Orco knockout using CRISPR/Cas9 should be analyzed and evaluated in Materials and Methods and Results.

      (2) Figure 1E: Only one olfactory receptor neuron was marked in WT. There are at least three olfactory sensilla at the top of the maxillary palp. Therefore, to explain the loss of Orco-expressing neurons in the mutant (Figure 1F), a more rigorous explanation of the photo is required.

      (3) In Figure 1G, H, the four glomeruli are circled by dotted lines: their corresponding relationship between the two figures needs to be further clarified.

      (4) Line 130: Since the main topic in this study is the olfactory system of larvae, the experimental results of this part are all about antennal electrophysiological responses, mating frequency, and egg production of female and male adults of wild type and Orco KO mutant, it may be considered to include this part in the supplementary files. It is better to include some data about the olfactory responses of larvae.

      (5) Line 166: The sentences in the text are about the choice test between " healthy plant vs. infested plant", while in Fig 3C, it is "infested plant vs. no plant". The content in the text does not match the figure.

      (6) Lines 174-178: Figure 3A showed that the body weight of Orco KO larvae in the absence of parasitic wasps also decreased compared with that of WT. Therefore, in the experiments of Figure 3A and E, the difference in the body weight of Orco KO larvae in the presence or absence of parasitic wasps without ovipositors should also be compared. The current data cannot determine the reduced weight of KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      (7) Lines 179-181: Figure 3F shows that the survival rate of larvae of Orco KO mutant decreased in the presence of parasitic wasps, and the difference in survival rate of larvae of WT and Orco KO mutant in the absence of parasitic wasps should also be compared. The current data cannot determine whether the reduced survival of the KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      (8) In Figure 4B, why do the compounds tested have no volatiles derived from plants? Cruciferous plants have the well-known mustard bomb. In the behavioral experiments, the larvae responses to ITC compounds were not included, which is suggested to be explained in the discussion section.

      (9) The custom-made setup and the relevant behavioral experiments in Figure 4C need to be described in detail (Line 545).

      (10) Materials and Methods Line 448: 10 μL paraffin oil should be used for negative control.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigated the effect of olfactory cues on caterpillar performance and parasitoid avoidance in Pieris brassicae. The authors knocked out Orco to produce caterpillars with significantly reduced olfactory perception. These caterpillars showed reduced performance and increased susceptibility to a parasitoid wasp.

      Strengths:

      This is an impressive piece of work and a well-written manuscript. The authors have used multiple techniques to investigate not only the effect of the loss of olfactory cues on host-parasitoid interactions, but also the mechanisms underlying this.

      Weaknesses:

      I do have one major query regarding this manuscript - I agree that the results of the caterpillar choice tests in a y-maze give weight to the idea that olfactory cues may help them avoid areas with higher numbers of parasitoids. However, the experiments with parasitoids were carried out on a single plant. Given that caterpillars in these experiments were very limited in their potential movement and source of food - how likely is it that avoidance played a role in the results seen from these experiments, as opposed to simply the slower growth of the KO caterpillars extending their period of susceptibility? While the two mechanisms may well both take place in nature - only one suggests a direct role of olfaction in enemy avoidance at this life stage, while the other is an indirect effect, hence the distinction is important.

      My other issue was determining sample sizes used from the text was sometimes a bit confusing. (This was much clearer from the figures).

      I also couldn't find the test statistics for any of the statistical methods in the main text, or in the supplementary materials.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript focuses on the olfactory system of Pieris brassicae larvae and the importance of olfactory information in their interactions with the host plant Brassica oleracea and the major parasitic wasp Cotesia glomerata. The authors used CRISPR/Cas9 to knockout odorant receptor co-receptors (Orco), and conducted a comparative study on the behavior and olfactory system of the mutant and wild-type larvae. The study found that Orco-expressing olfactory sensory neurons in antennae and maxillary palps of Orco knockout (KO) larvae disappeared, and the number of glomeruli in the brain decreased, which impairs the olfactory detection and primary processing in the brain. Orco KO caterpillars show weight loss and loss of preference for optimal food plants; KO larvae also lost weight when attacked by parasitoids with the ovipositor removed, and mortality increased when attacked by untreated parasitoids. On this basis, the authors further studied the responses of caterpillars to volatiles from plants attacked by the larvae of the same species and volatiles from plants on which the caterpillars were themselves attacked by parasitic wasps. Lack of OR-mediated olfactory inputs prevents caterpillars from finding suitable food sources and from choosing spaces free of enemies.

      Strengths:

      The findings help to understand the important role of olfaction in caterpillar feeding and predator avoidance, highlighting the importance of odorant receptor genes in shaping ecological interactions.

      Weaknesses:

      There are the following major concerns:

      (1) Possible non-targeted effects of Orco knockout using CRISPR/Cas9 should be analyzed and evaluated in Materials and Methods and Results.

      Thank you for your suggestion. In the Materials and Methods, we mention how we selected the target region and evaluated potential off-target sites by Exonerate and CHOPCHOP. Neither of these methods found potential off-target sites with a more-than-17-nt alignment identity. Therefore, we assumed no off-target effect in our Orco KO. Furthermore, we did not find any developmental differences between WT and KO caterpillars when these were reared on leaf discs in Petri dishes (Fig S4). We will further highlight this information on the off-target evaluation in the Results section of our revised manuscript.

      (2) Figure 1E: Only one olfactory receptor neuron was marked in WT. There are at least three olfactory sensilla at the top of the maxillary palp. Therefore, to explain the loss of Orco-expressing neurons in the mutant (Figure 1F), a more rigorous explanation of the photo is required.

      Thank you for pointing this out. The figure shows only a qualitative comparison between WT and KO and we did not aim to determine the total number of Orco positive neurons in the maxillary palps or antennae of WT and KO caterpillars, but please see our previous work for the neuron numbers in the caterpillar antennae (Wang et al., 2023). We did indeed find more than one neuron in the maxillary palps, but as these were in very different image planes it was not possible to visualize them together. However, we will add a few sentences in the Results and Discussion section to explain the results of the maxillary palp Orco staining.

      (3) In Figure 1G, H, the four glomeruli are circled by dotted lines: their corresponding relationship between the two figures needs to be further clarified.

      Thank you for pointing this out. The four glomeruli in Figure 1G and 1H are not strictly corresponding. We circled these glomeruli to highlight them, as they are the best visualized and clearly shown in this view. In this study, we only counted the number of glomeruli in both WT and KO, however, we did not clarify which glomeruli are missing in the KO caterpillar brain. We will further explain this in the figure legend.

      (4) Line 130: Since the main topic in this study is the olfactory system of larvae, the experimental results of this part are all about antennal electrophysiological responses, mating frequency, and egg production of female and male adults of wild type and Orco KO mutant, it may be considered to include this part in the supplementary files. It is better to include some data about the olfactory responses of larvae.

      Thank you for your suggestion. We do agree with your suggestion, and we will consider moving this part to the supplementary information. Regarding larval olfactory response, we unfortunately failed to record any spikes using single sensillum recordings due to the difficult nature of the preparation; however, we do believe that this would be an interesting avenue for further research.

      (5) Line 166: The sentences in the text are about the choice test between " healthy plant vs. infested plant", while in Fig 3C, it is "infested plant vs. no plant". The content in the text does not match the figure.

      Thank you for pointing this out. The sentence is “We compared the behaviors of both WT and Orco KO caterpillars in response to clean air, a healthy plant and a caterpillar-infested plant”. We tested these three stimuli in two comparisons: healthy plant vs no plant, infested plant vs no plant. The two comparisons are shown in Figure 3C separately. We will aim to describe this more clearly in the revised version of the manuscript.

      (6) Lines 174-178: Figure 3A showed that the body weight of Orco KO larvae in the absence of parasitic wasps also decreased compared with that of WT. Therefore, in the experiments of Figure 3A and E, the difference in the body weight of Orco KO larvae in the presence or absence of parasitic wasps without ovipositors should also be compared. The current data cannot determine the reduced weight of KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      Thank you for pointing this out. We did not make a comparison between the data of Figures 3A and 3E since the two experiments were not conducted at the same time due to the limited space in our BioSafety Ⅲ greenhouse. We do agree that the weight decrease in Figure 3E is partly due to the reduced caterpillar growth shown in Figure 3A. However, we are confident that the additional decrease in caterpillar weight shown in Figure 3E is mainly driven by the presence of disarmed parasitoids. To be specific, the average weight in Figure 3A is 0.4544 g for WT and 0.4230 g for KO, KO weight is 93.1% of WT caterpillars. While in Figure 3E, the average weight is 0.4273 g for WT and 0.3637 g for KO, KO weight is 85.1% of WT caterpillars. We will discuss this interaction between caterpillar growth and the effect of the parasitoid attacks more extensively in the revised version of the manuscript.

      (7) Lines 179-181: Figure 3F shows that the survival rate of larvae of Orco KO mutant decreased in the presence of parasitic wasps, and the difference in survival rate of larvae of WT and Orco KO mutant in the absence of parasitic wasps should also be compared. The current data cannot determine whether the reduced survival of the KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      We are happy that you highlight this point. When conducting these experiments, we selected groups of caterpillars and carefully placed them on a leaf with minimal disturbance of the caterpillars, which minimized hurting and mortality. We did test the survival of caterpillars in the absence of parasitoid wasps from the experiment presented in Figure 3A, although this was missing from the manuscript. There is no significant difference in the survival rate of caterpillars between the two genotypes in the absence of wasps (average mortality WT = 8.8 %, average mortality KO = 2.9 %; P = 0.088, Wilcoxon test), so the decreased survival rate is most likely due to the attack of the wasps. We will add this information to the revised version of the manuscript.

      (8) In Figure 4B, why do the compounds tested have no volatiles derived from plants? Cruciferous plants have the well-known mustard bomb. In the behavioral experiments, the larvae responses to ITC compounds were not included, which is suggested to be explained in the discussion section.

      Thank you for the suggestion. We assume you mean Figure 4D/4E instead of Figure 4B. In Figure 4B, many of the identified chemical compounds are essentially plant volatiles, especially those from caterpillar frass and caterpillar spit. In Figure 4D/4E, most of the tested chemicals are derived from plants. We did include several ITCs in the butterfly EAG tests shown in figure 2A/B, however because the butterfly antennae did not respond strongly to ITCs, we did not include ITCs in the subsequent larval behavioural tests. Instead, the tested chemicals in Figure 4D/4E either elicit high EAG responses of butterflies or have been identified as significant by VIP scores in the chemical analyses. We will add this explanation to the revised version of our manuscript.

      (9) The custom-made setup and the relevant behavioral experiments in Figure 4C need to be described in detail (Line 545).

      We will add more detailed descriptions for the setup and method in the Materials and Methods.

      (10) Materials and Methods Line 448: 10 μL paraffin oil should be used for negative control.

      Thank you for pointing this out. We used both clean filter paper and clean filter paper with 10 μL paraffin oil as negative controls, but we did not find a significant difference between the two controls. Therefore, in the EAG results of Figure 2A/2B, we presented paraffin oil as one of the tested chemicals. We will re-run our statistical tests with paraffin oil as negative control, although we do not expect any major differences to the previous tests.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigated the effect of olfactory cues on caterpillar performance and parasitoid avoidance in Pieris brassicae. The authors knocked out Orco to produce caterpillars with significantly reduced olfactory perception. These caterpillars showed reduced performance and increased susceptibility to a parasitoid wasp.

      Strengths:

      This is an impressive piece of work and a well-written manuscript. The authors have used multiple techniques to investigate not only the effect of the loss of olfactory cues on host-parasitoid interactions, but also the mechanisms underlying this.

      Weaknesses:

      (1) I do have one major query regarding this manuscript - I agree that the results of the caterpillar choice tests in a y-maze give weight to the idea that olfactory cues may help them avoid areas with higher numbers of parasitoids. However, the experiments with parasitoids were carried out on a single plant. Given that caterpillars in these experiments were very limited in their potential movement and source of food - how likely is it that avoidance played a role in the results seen from these experiments, as opposed to simply the slower growth of the KO caterpillars extending their period of susceptibility? While the two mechanisms may well both take place in nature - only one suggests a direct role of olfaction in enemy avoidance at this life stage, while the other is an indirect effect, hence the distinction is important.

      We do agree with your comment that both mechanisms may be at work in nature, and we do address this in the Discussion section. In our study, we did find that wildtype caterpillars were more efficient in locating their food source and did grow faster on full plants than knockout caterpillars. This faster growth will enable wildtype caterpillars to more quickly outgrow the life-stages most vulnerable to the parasitoids (L1 and L2). The olfactory system therefore supports the escape from parasitoids indirectly by enhancing feeding efficiency directly.

      In addition, we show in our Y-tube experiments that WT caterpillars were able to avoid plant where conspecifics are under the attack by parasitiods (Figure 3D). Therefore, we speculate that WT caterpillars make use of volatiles from the plant or from conspecifics via their spit or faeces to avoid plants or leaves potentially attracting natural enemies. Knockout caterpillars are unable to use these volatile danger cues and therefore do not avoid plants or leaves that are most attractive to their natural enemies, making KO caterpillars more susceptible and leading to more natural enemy harassment. Through this, olfaction also directly impacts the ability of a caterpillar to find an enemy-free feeding site.

      We think that olfaction supports the enemy avoidance of caterpillars via both these mechanisms, although at different time scales. Unfortunately, our analysis was not detailed enough to discern the relative importance of the two mechanisms we found. However, we feel that this would be an interesting avenue for further research. Moreover, we will sharpen our discussion on the potential importance of the two different mechanisms in the revised version of this manuscript.

      (2) My other issue was determining sample sizes used from the text was sometimes a bit confusing. (This was much clearer from the figures).

      We will revise the sample size in the text to make it clearer.

      (3) I also couldn't find the test statistics for any of the statistical methods in the main text, or in the supplementary materials.

      Thank you for pointing this out. We will provide more detailed test statistics in the main text and in the supplementary materials of the revised version of the manuscript.

    1. eLife Assessment

      This study presents a valuable open-source and cost-effective method for automating the quantification of male aggression and courtship in Drosophila melanogaster. The work as presented provides solid evidence that the use of the behavioral setup that the authors designed - using readily available laboratory equipment and standardised high-performing classifiers they developed using existing software packages - accurately and reliably characterises social behavior in Drosophila. The work will be of interest to Drosophila neurobiologists and particularly to those working on male social behaviors.

    2. Reviewer #1 (Public review):

      The study introduces an open-source, cost-effective method for automating the quantification of male social behaviors in Drosophila melanogaster. It combines machine-learning-based behavioral classifiers developed using JAABA (Janelia Automatic Animal Behavior Annotator) with inexpensive hardware constructed from off-the-shelf components. This approach addresses the limitations of existing methods, which often require expensive hardware and specialized setups. The authors demonstrate that their new "DANCE" classifiers accurately identify aggression (lunges) and courtship behaviors (wing extension, following, circling, attempted copulation, and copulation), closely matching manually annotated ground-truth data. Furthermore, DANCE classifiers outperform existing rule-based methods in accuracy. Finally, the study shows that DANCE classifiers perform as well when used with low-cost experimental hardware as with standard experimental setups across multiple paradigms, including RNAi knockdown of the neuropeptide Dsk and optogenetic silencing of dopaminergic neurons.

      The authors make creative use of existing resources and technology to develop an inexpensive, flexible, and robust experimental tool for the quantitative analysis of Drosophila behavior. A key strength of this work is the thorough benchmarking of both the behavioral classifiers and the experimental hardware against existing methods. In particular, the direct comparison of their low-cost experimental system with established systems across different experimental paradigms is compelling. While JAABA-based classifiers have been previously used to analyze aggression and courtship (Tao et al., J. Neurosci., 2024; Sten et al., Cell, 2023; Chiu et al., Cell, 2021; Isshi et al., eLife, 2020; Duistermars et al., Neuron, 2018), the demonstration that they work as well without expensive experimental hardware opens the door to more low-cost systems for quantitative behavior analysis.

      Although the study provides a detailed evaluation of DANCE classifier performance, its conclusions would be strengthened by a more comprehensive analysis. The authors assess classifier accuracy using a bout-level comparison rather than a frame-level analysis, as employed in previous studies (Kabra et al., Nat Methods, 2013). They define a true positive as any instance where a DANCE-detected bout overlaps with a manually annotated ground-truth bout by at least one frame. This criterion may inflate true positive rates and underestimate false positives, particularly for longer-duration courtship behaviors. For example, a 15-second DANCE-classified wing extension bout that overlaps with ground truth for only one frame would still be considered a true positive. A frame-level analysis performance would help address this possibility.

      In summary, this work provides a practical and accessible approach to quantifying Drosophila behavior, reducing the economic barriers to the study of the neural and molecular mechanisms underlying social behavior.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript addresses the development of a low-cost behavioural setup and standardised open-source high-performing classifiers for aggression and courtship behaviour. It does so by using readily available laboratory equipment and previously developed software packages. By comparing the performance of the setup and the classifiers to previously developed ones, this study shows the classifier's overperformance and the reliability of the low-cost setup in recapitulating previously described effects of different manipulations on aggression and courtship.

      Strengths:

      The newly developed classifiers for lunges, wing extension, attempted copulation, copulation, following, and circling, perform better than available previously developed ones. The behavioural setup developed is low cost and reliably allows analysis of both aggression and courtship behaviour, validated through social experience manipulation (social isolation), gene knock (Dsk in Dilp2 neurons) and neuronal inactivation (dopaminergic neurons) known to affect courtship and aggression.

      Weaknesses:

      Aggression encompasses multiple defined behaviours, yet only lunges were analysed. Moreover, the CADABRA software to which DANCE was compared analyses further aggression behaviours, making their comparisons incomplete. In addition, though DANCE performs better than CADABRA and Divider in classifying lunges in the behavioural setup tested, it did not yield very high recall and F1 scores.

      DANCE is of limited use for neuronal circuit-level enquiries, since mechanisms for intensity and temporally controlled optogenetic manipulations, which are nowadays possible with open-source software and low-cost hardware, were not embedded in its development.

    4. Reviewer #3 (Public review):

      The preprint by Yadav et al. describes a new setup to quantify a number of aggression and mating behaviors in Drosophila melanogaster. The investigation of these behaviors requires the analysis of a large number of videos to identify each kind of behavior displayed by a fly. Several approaches to automatize this process have been published before, but each of them has its limitations. The authors set out to develop a new setup that includes very low-cost, easy-to-acquire hardware and open-source machine-learning classifiers to identify and quantify the behavior.

      Strengths:

      (1) The study demonstrates that their cheap, simple, and easy-to-obtain hardware works just as well as custom-made, specialized hardware for analyzing aggression and mating behavior. This enables the setup to be used in a wide range of settings, from research with limited resources to classroom teaching.

      (2) The authors used previously published software to train new classifiers for detecting a range of behaviors related to aggression and mating and to make them freely available. The classifiers are very positively benchmarked against a manually acquired ground truth as well as existing algorithms.

      (3) The study demonstrates the applicability of the setup (hardware and classifiers) to common methods in the field by confirming a number of expected phenotypes with their setup.

      Weaknesses:

      (1) When measuring the performance of the duration-based classifiers, the authors count any bout of behavior as true positive if it overlaps with a ground-truth positive for only 1 frame - despite the minimal duration of a bout is 10 frames, and most bouts are much longer. That way, true positives could contain cases that are almost totally wrong as long there was an overlap of a single frame. For the mating behaviors that are classified in ongoing bouts, I think performance should be evaluated based on the % of correctly classified frames, not bouts.

      (2) In the methods part, only one of the pre-existing algorithms (MateBook), is described. Given that the comparison with those algorithms is a so central part of the manuscript, each of them should be briefly explained and the settings used in this study should be described.

      Taken together, this work can greatly facilitate research on aggression and mating in Drosophila. The combination of low-cost, off-the-shelf hardware and open-source, robust software enables researchers with very little funding or technical expertise to contribute to the scientific process and also allows large-scale experiments, for example in classroom teaching with many students, or for systematic screenings.

    1. Reviewer #2 (Public Review):

      In this study, the authors characterize the defensive responses of C. elegans to the predatory Pristionchus species. Drawing parallels to ecological models of predatory imminence and prey refuge theory, they outline various behaviors exhibited by C. elegans when faced with predator threats. They also find that these behaviors can be modulated by the peptide NLP-49 and its receptor SEB-3 in various degrees.

      The conclusions of this paper are mostly well-supported, the writing and the figures are clear and easy to interpret. However, some of the claims need to be better supported and the unique findings of this work should be clarified better in text.

      (1) Previous work by the group (Quach, 2022) showed that Pristionchus adopt a "patrolling strategy" on a lawn with adult C. elegans and this depends on bacterial lawn thickness. Consequently, it may be hypothesized that C. elegans themselves will adopt different predator avoidance strategies depending on predator tactics differing due to lawn variations. The authors have not shown why they selected a particular size and density of bacterial lawn for the experiments in this paper, and should run control experiments with thinner and denser lawns with differing edge densities to make broad arguments about predator avoidance strategies for C. elegans. In addition, C. elegans leaving behavior from bacterial lawns (without predators) are also heavily dependent on density of bacteria, especially at the edges where it affects oxygen gradients (Bendesky, 2011), and might alter the baseline leaving rates irrespective of predation threats. The authors also do not mention if all strains or conditions in each figure panel were run as day-matched controls. Given that bacterial densities and ambient conditions can affect C. elegans behavior, especially that of lawn-leaving, it is important to run day-matched controls.

      (2) Both the patch-leaving and feeding in outstretched posture behaviors described here in this study were reported in an earlier paper by the same group (Quach, 2022) as mentioned by the authors in the first section of the results. While they do characterize these further in this study, these are not novel findings of this work.

      (3) For Figures 1F-H, given that animals can reside on the lawn edges as well as the center, bins explored are not a definitive metric of exploration since the animals can decide to patrol the lawn boundary (especially since the lawns have thick edges). The authors should also quantify tracks along the edge from videographic evidence as they have done previously in Figure 5 of Quach, 2022 to get a total measure of distance explored.

      (4) Where were the animals placed in the wide-arena predator-free patch post encounter? It is mentioned that the animal was placed at the center of the arena in lines 220-221. While this makes sense for the narrow-arena, it is unclear how far from the patch animals were positioned for the wide exit arena. Is it the same distance away as the distance of the patch from the center of the narrow exit arena? Please make this clear in the text or in the methods.

      (5) Do exit decisions from the bacterial patch scale with number of bites or is one bite sufficient? Do all bites lead to bite-induced aversive response? This would be important to quantify especially if contextualizing to predatory imminence.

      (6) Why are the threats posed by aversive but non-lethal JU1051 and lethal PS312 evaluated similarly? Did the authors characterize if the number of bites are different for these strains? Can the authors speculate on why this would happen in the discussion?

      (7) The authors indicate that bites from the non-aversive TU445 led to a low number of exits and thus it was consequently excluded from further analysis. If anything, this strain would have provided a good negative control and baseline metrics for other circa-strike and post-encounter behaviors.

      8) For Figures 3 G and H, the reduction in bins explored (bins_none - bins_RS1594) due to the presence of predators should be compared between wildtype and mutants, instead of the difference between none and RS5194 for each strain.

      (9) While the authors argue that baseline speeds of seb-3 are similar to wild type (Figure S3), previous work (Jee, 2012) has shown that seb-3 not only affects speed but also roaming/dwelling states which will significantly affect the exploration metric (bins explored) which the authors use in Figs 3G-H and 4E-F. Control experiments are necessary to avoid this conundrum. Authors should either visualize and quantify tracks (as suggested in 3) or quantify roaming-dwelling in the seb-3 animals in the absence of predator threat.

      (10) While it might be beyond the scope of the study, it would be nice if the authors could speculate on potential sites of actions of NLP-49 in the discussion, especially since it is expressed in a distinct group of neurons.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Quach et al. report a detailed investigation into the defense mechanisms of Caenorhabditis elegans in response to predatory threats from Pristionchus pacificus. Based on principles from predatory imminence and prey refuge theories, the authors delineate three defense modes (pre-encounter, post-encounter, and circa-strike) corresponding to increasing levels of threat proximity. These modes are observed in a controlled but naturalistic setup and are quantified by multiple behavioral outputs defined in time and/or space domains allowing nuanced phenotypic assays. The authors demonstrate that C. elegans displays graded defense behavioral responses toward varied lethality of threats and that only life-threatening predators trigger all three defense modes. The study also offers a narrative on the behavioral strategies and underlying molecular regulation, focusing on the roles of SEB-3 receptors and NLP-49 peptides in mediating responses in these defense modes. They found that the interplay between SEB-3 and NLP-49 peptides appears complex, as evidenced by the diverse outcomes when either or both genes are manipulated in various behavioral modes.

      Strengths:

      The paper presents an interesting story, with carefully designed experiments and necessary controls, and novel findings and implications about predator-induced defensive behaviors and underlying molecular regulation in this important model organism. The design of experiments and description of findings are easy to follow and well-motivated. The findings contribute to our understanding of stress response systems and offer broader implications for neuroethological studies across species.

      Weaknesses:

      Although overall the study is well designed and movitated, the paper could benefit from further improvements on some of the methods descriptions and experiment interpretations.

    3. eLife Assessment

      This study presents a valuable finding on predator threat detection in C. elegans and the role of neuropeptide systems in defensive behavioral strategies. The evidence supporting the conclusions is solid, although additional analyses and control experiments would strengthen the claims of the study. Overall, the work is of interest to the C. elegans community as well as neuroethologists and ecologists studying predator-prey interactions.

    1. eLife Assessment

      This useful study reports detailed molecular dynamics (MD) simulations of T-cell receptors (TCRs) in complex with a peptide/MHC complex, for a better understanding of the mechanism of T-cell activation. The MD simulations provide solid evidence supporting that different TCRs can respond mechanically in different ways upon binding to the same pMHC complex. The analyses are systematic and provide testable predictions that can be evaluated by future mutagenesis and force microscopy studies.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes molecular dynamics simulations (MDS) of the dynamics of two T-cell receptors (TCRs) bound to the same major histocompatibility complex molecule loaded with the same peptide (pMHC). The two TCRs (A6 and B7) bind to the pMHC with similar affinity and kinetics, but employ different residue contacts. The main purpose of the study is to quantify via MDS the differences in the inter- and intra-molecular motions of these complexes, with a specific focus on what the authors describe as catch-bond behavior between the TCRs and pMHC, which could explain how T-cells can discriminate between different peptides in the presence of weak separating force.

      Strengths:

      The authors present extensive simulation data that indicates that, in both complexes, the number of high-occupancy inter-domain contacts initially increases with applied load, which is generally consistent with the authors' conclusion that both complexes exhibit catch-bond behavior, although to different extents. In this way, the paper expands our understanding of peptide discrimination by T-cells. The conclusions of the study are generally well supported by data. Further, the paper makes predictions about the relative strength of the catch-bond response of the two TCRs, which could be tested experimentally through protein mutagenesis and force application in Atomic Force Microscopy.

    3. Reviewer #2 (Public review):

      In this work, Chang-Gonzalez and coworkers follow up on an earlier study on the force-dependence of peptide recognition by a T-cell receptor using all-atom molecular dynamics simulations. In this study, they compare the results of pulling on a TCR-pMHC complex between two different TCRs with the same peptide. A goal of the paper is to determine whether the newly studied B7 TCR has the same load-dependent behavior mechanism shown in the earlier study for A6 TCR. The primary result is that while the unloaded interaction strength is similar, A6 exhibits more force-stabilization.

      This is a detailed study, and establishing the difference between these two systems with and without applied force may establish them as a good reference setup for others who want to study mechanobiological processes if the data were made available, and could give additional molecular details for T-Cell-specialists.

    4. Reviewer #3 (Public review):

      Summary:

      The paper by Chang-Gonzalez et al. is a molecular dynamics (MD) simulation study of the dynamic recognition (load-induced catch bond) by the T cell receptor (TCR) of the complex of peptide antigen (p) and the major histocompatibility complex (pMHC) protein. The methods and simulation protocols are essentially identical as those employed in a previous study by the same group (Chang-Gonzalez et al., eLife 2024). In the current manuscript the authors compare the binding of the same pMHC complex to two different TCRs, B7 and A6 which was investigated in the previous paper. While the binding is more stable for both TCRs under load (of about 10-15 pN) than in the absence of load, the main difference is that B7 shows a smaller amount of stable contacts with the pMHC than A6.

      Strengths:

      The topic is interesting because of the relevance of mechanosensing in biological processes including cellular immunology. The MD simulations provide strong evidence that different TCRs can respond mechanically in a different way upon binding the same pMHC complex. These findings are useful for interpreting how mechanical force is employed for modulating different function of T cells.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Summary:

      This paper describes molecular dynamics simulations (MDS) of the dynamics of two T-cell receptors (TCRs) bound to the same major histocompatibility complex molecule loaded with the same peptide (pMHC). The two TCRs (A6 and B7) bind to the pMHC with similar affinity and kinetics, but employ different residue contacts. The main purpose of the study is to quantify via MDS the differences in the inter- and intra-molecular motions of these complexes, with a specific focus on what the authors describe as catch-bond behavior between the TCRs and pMHC, which could explain how T-cells can discriminate between different peptides in the presence of weak separating force.

      Strengths:

      The authors present extensive simulation data that indicates that, in both complexes, the number of high-occupancy interdomain contacts initially increases with applied load, which is generally consistent with the authors’ conclusion that both complexes exhibit catch-bond behavior, although to different extents. In this way, the paper somewhat expands our understanding of peptide discrimination by T-cells.

      a. The reviewer makes thoughtful assessment of our manuscript. While our manuscript is meant to be a “short” contribution, our significant new finding is that even for TCRs targeting the same pMHC, having similar structures, and leading to similar functional outcomes in conventional assays, their response to applied load can be different. This supports out recent experimental work where TCRs targeting the same pMHC differed in their catch bond characteristics, and importantly, in their response to limiting copy numbers of pMHCs on the antigen-presenting cell (Akitsu et al., Sci. Adv., 2024).

      Weaknesses:

      While generally well supported by data, the conclusions would nevertheless benefit from a more concise presentation of information in the figures, as well as from suggesting experimentally testable predictions.

      b. We have updated all figures for clear and streamlined presentation. We have also created four figure supplements to cover more details.

      Regarding testable predictions, an important prediction is that B7 TCR would exhibit a weaker catch bond behavior than A6 (line 297–298). This is a nontrivial prediction because the two TCRs targeting the same pMHC have similar structures and are functionally similar in conventional assays. This prediction can be tested by singlemolecule optical tweezers experiments. Based on our recent experiments Akitsu et al., Sci. Adv. (2024), we also predict that A6 and B7 TCRs will differ in their ability to respond to cases when the number of pMHC molecules presented are limited. Details of how they would differ require further investigation, which is beyond the scope of the present work (line 314-319).

      Another testable prediction for the conservation of the basic allostery mechanism is to test the Cβ FG-loop deletion mutant located at the hinge region of the β chain, where the deletion severely impairs the catch bond formation (line 261–264).

      Reviewer 2:

      In this work, Chang-Gonzalez and coworkers follow up on an earlier study on the force-dependence of peptide recognition by a T-cell receptor using all-atom molecular dynamics simulations. In this study, they compare the results of pulling on a TCR-pMHC complex between two different TCRs with the same peptide. A goal of the paper is to determine whether the newly studied B7 TCR has the same load-dependent behavior mechanism shown in the earlier study for A6 TCR. The primary result is that while the unloaded interaction strength is similar, A6 exhibits more force stabilization.

      This is a detailed study, and establishing the difference between these two systems with and without applied force may establish them as a good reference setup for others who want to study mechanobiological processes if the data were made available, and could give additional molecular details for T-Cell-specialists. As written, the paper contains an overwhelming amount of details and it is difficult (for me) to ascertain which parts to focus on and which results point to the overall take-away messages they wish to convey.

      R2-a. As mentioned above and as the reviewer correctly pointed out, the condensed appearance of this manuscript arose largely because we intended it to be a Research Advances article as a short follow up study of our previous paper on A6 TCR published in eLife. Most of the analysis scripts for the A6 TCR study are already available on Github. For the present manuscript, we have created a separate Github repository containing sample simulation systems and scripts for the B7 TCR.

      Regarding the focus issue, it is in part due to the complex nature of the problem, which required simulations under different conditions and multi-faceted analyses. We believe the extensive updates to the figures and texts make clearer and improved presentation. But we note that even in the earlier version, the reviewer pointed out the main take-away message well: “The primary result is that while the unloaded interaction strength is similar, A6 exhibits more force stabilization.

      Detailed comments:

      (1) In Table 1 - are the values of the extension column the deviation from the average length at zero force (that is what I would term extension) or is it the distance between anchor points (which is what I would assume based on the large values. If the latter, I suggest changing the heading, and then also reporting the average extension with an asterisk indicating no extensional restraints were applied for B7-0, or just listing 0 load in the load column. Standard deviation in this value can also be reported. If it is an extension as I would define it, then I think B7-0 should indicate extension = 0+/- something. The distance between anchor points could also be labeled in Figure 1A.

      R2-b. “Extension” is the distance between anchor points that the reviewer is referring to (blue spheres at the ends of the added strands in Figure 1A). While its meaning should be clear in the section “Laddered extensions” in “MD simulation protocol” (line 357–390), in a strict sense, we agree that using it for the end-to-end distance can be confusing. However, since we have already used it in our previous two papers (Hwang et al., PNAS 2020 and Chang-Gonzalez et al., eLife, 2024), we prefer to keep it for consistency. Instead, in the caption of Table 1, we explained its meaning, and also explicitly labeled it in Figure 1A, as the reviewer suggested.

      Please also note that the no-load case B7<sup>0</sup> was performed by separately building a TCR-pMHC complex without added linkers (line 352), and holding the distal part of pMHC (the α3 domain) with weak harmonic restraints (line 406–408). Thus, no extension can be assigned to B7<sup>0</sup>. We added a brief explanation about holding the MHC α3 domain for B7<sup>0</sup> in line 83–85.

      (2) As in the previous paper, the authors apply ”constant force” by scanning to find a particular bond distance at which a desired force is selected, rather than simply applying a constant force. I find this approach less desirable unless there is experimental evidence suggesting the pMHC and TCR were forced to be a particular distance apart when forces are applied. It is relatively trivial to apply constant forces, so in general, I would suggest this would have been a reasonable comparison. Line 243-245 speculates that there is a difference in catch bonding behavior that could be inferred because lower force occurs at larger extensions, but I do not believe this hypothesis can be fully justified and could be due to other differences in the complex.

      R2-c. There is indeed experimental evidence that the TCR-pMHC complex operates under constant separation. The spacing between a T-cell and an antigen-presenting cell is maintained by adhesion molecules such as the CD2CD58 pair, as explained in our paper on the A6 TCR Chang-Gonzalez et al., eLife, 2024 and also in our previous review paper Reinherz et al., PNAS, 2023. In in vitro single-molecule experiments, pulling to a fixed separation and holding is also commonly done. We added an explanation about this in line 79–83 of the manuscript. On the other hand, force between a T cell and and antigen-presenting cell is also controlled by the actin cytoskeleton, which make the applied load not a simple function of the separation between the two cells. An explanation about this was added in line 300–303. Detailed comparison between constant extension vs. constant force simulations is definitely a subject of our future study.

      Regarding line 243–245 of the original submission (line 297–298 of the revised manuscript), we agree with the reviewer that without further tests, lower forces at larger extensions per se cannot be an indicator that B7 forms a weaker catch bond. But with additional information, one can see it does have relevance to the catch bond strength. In addition to fewer TCR-pMHC contacts (Figure 1C of our manuscript), the intra-TCR contacts are also reduced compared to those of A6 (bottom panel of Figure 1D vs. Chang-Gonzalez et al., eLife, 2024, Figure 8A,B, first column). Based on these data, we calculated the average total intra-TCR contact occupancies in the 500–1000-ns interval, which was 30.4±0.49 (average±std) for B7 and 38.7±0.87 for A6. This result shows that the B7 TCR forms a looser complex with pMHC compared to A6. Also, B7<sup>low</sup> and B7<sup>high</sup> differ in extension by 16.3 ˚A while A6<sup>low</sup> and A6<sup>high</sup> differ by 5.1 ˚A, for similar ∼5-pN difference between low- and high-load cases. With the higher compliance of B7, it would be more difficult to achieve load-induced stabilization of the TCR-pMHC interface, hence a weaker catch bond. We explained this in line 129–132 and line 292–297.

      (3) On a related note, the authors do not refer to or consider other works using MD to study force-stabilized interactions (e.g. for catch bonding systems), e.g. these cases where constant force is applied and enhanced sampling techniques are used to assess the impact of that applied force: https://www.cell.com/biophysj/fulltext/S0006-3495(23)00341-7, https://www.biorxiv.org/content/10.1101/2024.10.10.617580v1. I was also surprised not to see this paper on catch bonding in pMHC-TCR referred to, which also includes some MD simulations: https://www.nature.com/articles/s41467-023-38267-1

      R2-d. We thank the reviewer for bringing the three papers to our attention, which are:

      (1) Languin-Catto¨en, Sterpone, and Stirnemann, Biophys. J. 122:2744 (2023): About bacterial adhesion protein FimH.

      (2) Pen˜a Ccoa, et al., bioRxiv (2024): About actin binding protein vinculin.

      (3) Choi et al., Nat. Comm. 14:2616 (2023): About a mathematical model of the TCR catch bond.

      Catch bond mechanisms of FimH and vinculin are different from that of TCR in that FimH and vinculin have relatively well-defined weak- and strong-binding states where there are corresponding crystal structures. Availability of the end-state structures permits simulation approaches such as enhanced sampling of individual states and studying the transition between the two states. In contrast, TCR does not have any structurally well-defined weak- or strong-binding states, which requires a different approach. As demonstrated in our current manuscript as well as in our previous two papers (Hwang et al., PNAS 2020 and Chang-Gonzalez et al., eLife, 2024), our microsecond-long simulations of the complex under realistic pN-level loads and a combination of analysis methods are effective for elucidating the catch bond mechanism of TCR. These are explained in line 227–238 of the manuscript.

      The third paper (Choi, et al., 2023) proposes a mathematical model to analyze extensive sets of data, and also perform new experiments and additional simulations. Of note, their model assumptions are based mainly on the steered MD (SMD) simulation in their previous paper (Wu, et al., Mol. Cell. 73:1015, 2019). In their model, formation of a catch bond (called catch-slip bond in Choi’s paper) requires partial unfolding of MHC and tilting of the TCR-pMHC interface. Our mechanism does not conflict with their assumptions since the complex in the fully folded state should first bear load in a ligand-dependent manner in order to allow any larger-scale changes. This is explained in line 239–243.

      For the revised text mentioned above (line 227–243), in addition to the 3 papers that the reviewer pointed out, we cited the following papers:

      • Thomas, et al., Annu. Rev. Biophys. 2008: Catch bond mechanisms in general.

      • Bakolitsa et al., Cell 1999, Le Trong et al., Cell 2010, Sauer et al., Nat. Comm. 2016, Mei et al., eLife 2020:

      Crystal structures of FimH and vinculin in different states.

      • Wu, et al., Mol. Cell. 73:1015, 2019: The SMD simulation paper mentioned above.

      (4) The authors should make at least the input files for their system available in a public place (github, zenodo) so that the systems are a more useful reference system as mentioned above. The authors do not have a data availability statement, which I believe is required.

      R2-d. As mentioned in R2-a above, we have added a Github repository containing sample simulation systems and scripts for the B7 TCR.

      Reviewer 3:

      Summary:

      The paper by Chang-Gonzalez et al. is a molecular dynamics (MD) simulation study of the dynamic recognition (load-induced catch bond) by the T cell receptor (TCR) of the complex of peptide antigen (p) and the major histocompatibility complex (pMHC) protein. The methods and simulation protocols are essentially identical to those employed in a previous study by the same group (Chang-Gonzalez et al., eLife 2024). In the current manuscript, the authors compare the binding of the same pMHC to two different TCRs, B7 and A6 which was investigated in the previous paper. While the binding is more stable for both TCRs under load (of about 10-15 pN) than in the absence of load, the main difference is that, with the current MD sampling, B7 shows a smaller amount of stable contacts with the pMHC than A6.

      Strengths:

      The topic is interesting because of the (potential) relevance of mechanosensing in biological processes including cellular immunology.

      Weaknesses:

      The study is incomplete because the claims are based on a single 1000-ns simulation at each value of the load and thus some of the results might be marred by insufficient sampling, i.e., statistical error. After the first 600 ns, the higher load of B7<sup>high</sup> than B7<sup>low</sup> is due mainly to the simulation segment from about 900 ns to 1000 ns (Figure 1D). Thus, the difference in the average value of the load is within their standard deviation (9 +/- 4 pN for B7<sup>low</sup> and 14.5 +/- 7.2 for B7<sup>high</sup>, Table 1). Even more strikingly, Figure 3E shows a lack of convergence in the time series of the distance between the V-module and pMHC, particularly for B7<sup>0</sup> (left panel, yellow) and B7<sup>low</sup> (right panel, orange). More and longer simulations are required to obtain a statistically relevant sampling of the relative position and orientation of the V-module and pMHC.

      R3-a. The reviewer uses data points during the last 100 ns to raise an issue with sampling. But since we are using realistic pN range forces, force fluctuates more slowly. In fact, in our simulation of B7<sup>high</sup>, while the force peaks near 35 pN at 500 ns (Figure 1D of our manuscript), the interfacial contacts show no noticeable changes around 500 ns (Figure 2B and Figure 2–figure supplement 1C of our manuscript). Similarly slow fluctuation of force was also observed for A6 TCR (Figure 8 of Chang-Gonzalez et al., eLife (2024)). Thus, a wider time window must be considered rather than focusing on forces in the last 100-ns interval.

      To compare fluctuation in forces, we added Figure 1–figure supplement 2, which is based on Appendix 3–Figure 1 of our A6 paper. It shows the standard deviation in force versus the average force during 500–1000 ns interval for various simulations in both A6 (open black circles) and B7 (red squares) systems. Except for Y8A<sup>low</sup> and dFG<sup>low</sup> of A6 (explained below), the data points lie on nearly a straight line.

      Thermodynamically, the force and position of the restraint (blue spheres in Figure 1A of our manuscript) form a pair of generalized force and the corresponding spatial variable in equilibrium at temperature 300 K, which is akin to the pressure P and volume V of an ideal gas. If V is fixed, P fluctuates. Denoting the average and std of pressure as ⟨P⟩ and ∆P, respectively, Burgess showed that ∆P/P⟩ is a constant (Eq. 5 of Burgess, Phys. Lett. A, 44:37; 1973). In the case of the TCRαβ-pMHC system, although individual atoms are not ideal gases, since their motion leads to the fluctuation in force on the restraints, the situation is analogous to the case where pressure arises from individual ideal gas molecules hitting the confining wall as the restraint. Thus, the near-linear behavior in the figure above is a consequence of the system being many-bodied and at constant temperature. The linearity is also an indicator that sampling of force was reasonable in the 500–1000-ns interval. The fact that A6 and B7 data show a common linear profile further demonstrates the consistency in our force measurement. About the two outliers of A6, Y8A<sup>low</sup> is for an antagonist peptide and dFG<sup>low</sup> is the Cβ FG-loop deletion mutant. Both cases had reduced numbers of contacts with pMHC, which likely caused a wider conformational motion, hence greater fluctuation in force.

      Upon suggestion by the reviewer, we extended the simulations of B7<sup>0</sup>, B7<sup>low</sup> and B7<sup>high</sup> to about 1500 ns (Table 1). While B7<sup>0</sup> and B7<sup>low</sup> behaved similarly, B7<sup>high</sup> started to lose contacts at around 1300 ns (top panel of Figure 1D and Figure 2B). A closer inspection revealed that destabilization occurred when the complex reached low-force states. Even before 1300 ns, at about 750 ns, the force on B7<sup>high</sup> drops below 5 pN, and another drop in force occurred at around 1250 ns, though to a lesser extent (Figure 1D). These changes are followed by increase in the Hamming distance (Figure 2B). Thus, in B7<sup>high</sup>, destabilization is caused not by a high force, but by a lack of force, which is consistent with the overarching theme of our work, the load-induced stabilization of the TCRαβ-pMHC complex.

      The destabilization of B7<sup>high</sup> during our simulation is a combined effect of its overall weaker interface compared to A6 (despite having comparable number of contacts in crystal structures; line 265–269), and its high compliance (explained in the second paragraph of our response R2-c above). Under a fixed extension, the higher compliance of the complex can reach a low-force state where breakage of contacts can happen. In reality, with an approximately constant spacing between a T cell and an antigen-presenting cell, force is also regulated by the actin cytoskeleton (explained in the first paragraph of R2-c above). While detailed comparison between constant-extension and constant-force simulation is the subject of a future study, for this manuscript, we used the 500–1000-ns interval for calculating time-averaged quantities, for consistency across different simulations. For time-dependent behaviors, we showed the full simulation trajectories, which are Figure 1D, Figure 2B, Figure 2–figure supplement 1 (except for panel E), and Figure 4–figure supplement 1B.

      Thus, rather than performing replicate simulations, we perform multiple simulations under different conditions and analyze them from different angles to obtain a consistent picture. If one were interested in quantitative details under a given condition, e.g., dynamics of contacts for a given extension or the time when destabilization occurs at a given force, replicate simulations would be necessary. However, our main conclusions such as load-induced stabilization of the interface through the asymmetric motion, and B7 forming a weaker complex compared to A6, can be drawn from our extensive analysis across multiple simulations. Please also note that reviewer 1 mentioned that our conclusions are “generally well supported by data.”

      A similar argument applies to Figure 2–figure supplement 1F (old Figure 3B that the reviewer pointed out). If precise values of the V-module to pMHC distance were needed, replicate simulations would be necessary, however, the figure demonstrates that B7<sup>high</sup> maintains more stable interface before the disruption at 1300 ns compared to B7<sup>low</sup>, which is consistent with all other measures of interfacial stability we used. The above points are explained throughout our updated manuscript, including

      • Line 106–110, 125–132, 156–158, 298–303.

      • Figures showing time-dependent behaviors have been updated and Figure 1–figure supplement 2 has been added, as explained above.

      It is not clear why ”a 10 A distance restraint between alphaT218 and betaA259 was applied” (section MD simulation protocol, page 9).

      R3-b. αT218 and β_A259 are the residues attached to a leucine-zipper handle in _in vitro optical trap experiments (Das, et al., PNAS 2015). In T cells, those residues also connect to transmembrane helices. Our newly added Figure 1–figure supplement 1 shows a model of N15 TCR used in experiments in Das’ paper, constructed based on PDB 1NFD. Blue spheres represent C<sub>α</sub> atoms corresponding to αT218 and βA259 of B7 TCR. Their distance is 6.7 ˚A. The 10-˚A distance restraint in simulation was applied to mimic the presence of the leucine zipper that prevents excessive separation of the added strands. The distance restraint is a flatbottom harmonic potential which is activated only when the distance between the two atoms exceeds 10 ˚A, which we did not clarify in our original manuscript. It is now explained in line 371–373. The same restraint was used in our previous studies on JM22 and A6 TCRs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Clarify the reason for including arguably non-physiological simulations, in which the C domain is missing. Is the overall point that it is essential for proper peptide discrimination?

      R1-c. This is somewhat a philosophical question. Rather than recapitulating experiment, we believe the goal of simulation is to gain insight. Hence, a model should be justified by its utility rather than its direct physiological relevance. The system lacking the C-module is useful since it informs about the allosteric role of the C-module by comparing its behavior with that of the full TCRαβ-pMHC complex. The increased interfacial stability of Vαβ-pMHC is also consistent with our discovery that the C-module likely undergoes a partial unfolding to an extended state, where the bond lifetime increases (Das, et al., PNAS 2015; Akitsu et al., Sci. Adv., 2024). In this sense, Vαβ-pMHC has a more direct physiological relevance. Furthermore, considering single-chain versions of an antibody lacking the C-module (scFv) are in widespread use (Ahmad et al., J. Immunol. Res., 2012) including CAR T cells, a better understanding of a TCR lacking the C-module may help with developing a novel TCR-based immunotherapy. These explanations have been added in line 253–261.

      (2) Suggest changing Vαβ-pMHC to B7<sup>0</sup>∆C to emphasize that the constant domain is deleted.

      R1-d. While we appreciate the reviewer’s suggestion, the notation Vαβ-pMHC was used in our previous two papers (Hwang, PNAS 2020, Chang-Gonzalez, eLife 2024). We thus prefer to keep the existing notation.

      (3) Suggest adding A6 data to table 1 for comparison, making it clear if it is from a previous paper.

      R1-e. Table 1 of the present manuscript and Table 1 of the A6 paper differ in items displayed. Instead of merging, we added the extension and force for A6 corresponding to B7<sup>low</sup> and B7<sup>high</sup> in the caption of Table 1.

      (4) Suggest discussing the catch-bond behavior in terms of departure from equilibrium, e.g. is it possible to distinguish between different (catch vs slip) bond behaviors on the basis of work of separation histograms? If the difference does not show up in equilibrium work, the exponential work averages would be similar, but work histograms could be very different.

      R1-f. Although energetics of the catch versus slip bond will provide additional insight, it is beyond the scope of the present simulations that do not involve dissociation events nor simulations of slip-bond receptors. We instead briefly mention the energetic aspect in terms of T-cell activation in line 316–319.

      (5) Have the simulations in Figure 1 reached steady state? The force and occupancy increase almost linearly up until 500ns, then seem to decrease rather dramatically by 750ns. It might be worthwhile to extend one simulation to check.

      R1-g. We did extend the simulation to about 1500 ns. The large and slow fluctuation in force is an inherent property of the system, as explained in R3-a above.

      (6) Is the loss of contacts for B7<sup>0</sup> due to thermalization and relaxation away from the X-ray structure?

      R1-h. The initial thermalization at 300 K is not responsible for the loss of contacts for B7<sup>0</sup> since we applied distance restraints to the initial contacts to keep them from breaking during the preparatory runs (line 358–370). While ‘relaxation away from the X-ray structure’ gives an impression that the complex approaches an equilibrium conformation in the absence of the crystallographic confinement, our simulation indicates that the stability of the complex depends on the applied load. We made the distinction between relaxation and the load-dependent stability clearer in line 233–238.

      (7) Figure 4 contains a very large amount of data. Could it be simplified and partly moved to SI? For example, panel G is somewhat hard to read at this scale, and seems non-essential to the general reader.

      R1-i. Upon the reviewer’s suggestion, we simplified Figure 4 by moving some of the panels to Figure 4–figure supplement 1. Panels have also been made larger for better readability.

      (8) If the coupling between C and V domains is necessary for catch-bond behavior, can one propose mutations that would disrupt the interface to test by experiment? This would be interesting in light of the authors’ own comment on p. 8 that ’a logical evolutionary pressure would be for the C domains to maximize discriminatory power by adding instability to the TCR chassis,’ which might lead to a verifiable hypothesis.

      R1-j. This has already been computationally and experimentally tested for other TCRs by the Cβ FG-loop deletion mutants that diminish the catch bond (Das, et al., PNAS 2015; Hwang et al., PNAS 2020; ChangGonzalez et al., eLife, 2024). Furthermore, the Vγδ-Cαβ chimera where the C-module of TCRγδ is replaced by that of TCR_αβ_ that strengthens the V-C coupling achieved a gain-of-function catch bond character while the wild-type TCRγδ is a slip-bond receptor (Mallis, et al., PNAS 2021; Bettencourt et al., Biophys. J. 2024). We added our prediction that the FG-loop deletion mutants of B7 TCR will behave similarly in line 261–264.

      (9) Regarding extending TCR and MHC termini using native sequences, as described in the methods, what would be the disadvantage of using the same sequence, which could be made much more rigid, e.g. a poly-Pro sequence? After all, the point seems to be applying a roughly constant force, but flexible/disordered linkers seem likely to increase force fluctuation.

      R1-k. The purpose of adding linkers was to allow a certain degree of longitudinal and transverse motion as would occur in vivo. While it will be worthwhile to explore the effects of linker flexibility on the conformational dynamics of the complex, for the present study, we used the actual sequence for the linkers for those proteins (line 341–344).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2 is almost illegible, especially Figure 2A-D. I do not think that these contacts vs time would be useful to anyone except for someone interested in this particular pMHC interaction, so I would suggest moving it to a supporting figure and making it much larger.

      R2-e. Thanks for the suggestion. We created Figure 2–figure supplement 1 and made panels larger for clearer presentation.

      (2) Figure 4 is overwhelming, and does not convey any particular message.

      R2-f. This is the same comment as reviewer 1’s comment (7) above. Please see our response R1-i.

      Reviewer #3 (Recommendations for the authors):

      (1) The label ”beta2m” in Figure 1A should be moved closer to the beta2 microglobulin domain. A label TCR should be added to Figure 1A.

      R3-c. Thanks for pointing out about β2m. We have corrected it. About putting the label ‘TCR,’ to avoid cluttering, we explained that Vα, Vβ, Cα, and Cβ are the 4 subdomains of TCR in the caption of Figure 1A.

      (2) Hydrogen atoms should be removed from the peptide in Figure 1B.

      R3-d. We have removed the hydrogen atoms.

      (3) The authors should consider moving Figures 1 A-D to the SI and show a simpler description of the contact occupancy than the heat maps. The legend of Figure 2A-D is too small.

      R3-e. By ‘Figures 1 A-D’ we believe the reviewer meant Figure 2A–D. This is the same comment as reviewer 2’s comment (1). Please see our response R2-e above.

      (4) Vertical (dashed) lines should be added to Figure 3E at 500 ns to emphasize the segment of the time series used for the histograms.

      R3-f. We added vertical lines in figures showing time-dependent behaviors, which are Figure 1D, Figure 2B, Figure 2–figure supplement 1F, and Figure 4–figure supplement 1B.

    1. eLife Assessment

      This important study shows a surprising scale-invariance of the covariance spectrum of large-scale recordings in the zebrafish brain in vivo. A convincing analysis demonstrates that a Euclidean random matrix model of the covariance matrix recapitulates these properties. The results provide several new and insightful approaches for probing large-scale neural recordings.

    2. Joint public review

      Summary:

      The authors examine the eigenvalue spectrum of the covariance matrix of neural recordings in the whole-brain larval zebrafish during hunting and spontaneous behavior. They find that the spectrum is approximately power law, and, more importantly, exhibits scale-invariance under random subsampling of neurons. This property is not exhibited by conventional models of covariance spectra, motivating the introduction of the Euclidean random matrix model. The authors show that this tractable model captures the scale invariance they observe. They also examine the effects of subsampling based on anatomical location or functional relationships. Finally, they discuss the benefit of neural codes which can be subsampled without significant loss of information.

      Strengths:

      With large-scale neural recordings becoming increasingly common, neuroscientists are faced with the question: how should we analyze them? To address that question, this paper proposes the Euclidean random matrix model, which embeds neurons randomly in an abstract feature space. This model is analytically tractable and matches two nontrivial features of the covariance matrix: approximate power law scaling, and invariance under subsampling. It thus introduces an important conceptual and technical advance for understanding large-scale simultaneously recorded neural activity.

      Comment:

      Are there quantitative comparisons of the collapse indices for the null models in Figure 2 and the data covariance in 2F? If so, this could be potentially useful to report.

    1. eLife Assessment

      This study presents valuable insights into the involvement of miR-26b in the progression of metabolic dysfunction-associated steatohepatitis (MASH). The delivery of microRNA-containing nanoparticles to reduce MASH severity has practical implications as a therapeutic strategy. The authors use two sets of transgenic mouse models, conducted kinase activity profiling of mouse liver samples, and supplemented their findings with additional experiments on human liver and plasma, providing solid support for their findings.

    2. Reviewer #1 (Public review):

      Based on previous publications suggesting a potential role for miR-26b in the pathogenesis of metabolic dysfunction-associated steatohepatitis (MASH), the researchers aim to clarify its function in hepatic health and explore the therapeutical potential of lipid nanoparticles (LNPs) to treat this condition. First, they employed both whole-body and myeloid cell-specific miR-26b KO mice and observed elevated hepatic steatosis features in these mice compared to WT controls when subjected to WTD. Moreover, livers from whole-body miR-26b KO mice also displayed increased levels of inflammation and fibrosis markers. Kinase activity profiling analyses revealed distinct alterations, particularly in kinases associated with inflammatory pathways, in these samples. Treatment with LNPs containing miR-26b mimics restored lipid metabolism and kinase activity in these animals. Finally, similar anti-inflammatory effects were observed in the livers of individuals with cirrhosis, whereas elevated miR-26b levels were found in the plasma of these patients in comparison with healthy control. Overall, the authors conclude that miR-26b plays a protective role in MASH and that its delivery via LNPs efficiently mitigates MASH development.

      The study has some strengths, most notably, its employ of a combination of animal models, analyses of potential underlying mechanisms, as well as innovative treatment delivery methods with significant promise. However, it also presents certain weaknesses that could be improved. The precise role of miR-26b in a human context remains elusive, hindering direct translation to clinical practice.

      Comments on revised version:

      Some of the recommendations provided by this Reviewer in the first version of the manuscript have been successfully addressed in the revision. However, others, particularly those related to human translation, remain unresolved due to the lack of additional samples for analysis. Since the revised title now indicates that the mechanisms described were primarily observed in mice, it seems reasonable to defer addressing this issue to future studies.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Peters, Rakateli et al. aims to characterize the contribution of miR-26b in a mouse model of metabolic dysfunction-associated steatohepatitis (MASH) generated by Western-type diet on background of Apoe knock-out. In addition, the authors provide a rescue of the miR-26b using lipid nanoparticles (LNPs), with potential therapeutic implications. In addition, the authors provide useful insights on the role of macrophages and some validation of the effect of miR-26b LNPs on human liver samples.

      Strengths:

      The authors provide a well designed mouse model, that aims to characterize the role of miR-26b in a mouse model of metabolic dysfunction-associated steatohepatitis (MASH) generated by Western-type diet on background of Apoe knock-out. The rescue of the phenotypes associated with the model used using miR-26b using lipid nanoparticles (LNPs) provides an interesting avenue to novel potential therapeutic avenues.

      Weaknesses:

      Although the authors provide a new and interesting avenue to understand the role of miR-26b in MASH, the study needs some additional validations and mechanistic insights in order to strengthen the authors' conclusions.

      (1) Analysis the expression of miRNAs based on miRNA-seq of human samples (see https://ccb-compute.cs.uni-saarland.de/isomirdb/mirnas) suggests that miR-26b-5p is highly abundant both on liver and blood. It seems hard to reconcile that despite miRNA abundance being similar on both tissues, the physiological effects claimed by the authors in Figure 2 come exclusively from the myeloid (macrophages).

      - Thanks for the clarification provided on your revised version of the manuscript

      (2) Similarly, the miRNA-seq expression from isomirdb suggests also that expression of miR-26a-5p is indeed 4-fold higher than miR-26b-5p both in liver and blood. Since both miRNAs share the same seed sequence, and most of the supplemental regions (only 2 nt difference), their endogenous targets must be highly overlapped. It would be interesting to know whether deletion of miR-26b is somehow compensated by increased expression of miR-26a-5p loci. That would suggest that the model is rather a depletion of miR-26.

      UUCAAGUAAUUCAGGAUAGGU mmu-miR-26b-5p mature miRNA<br /> UUCAAGUAAUCCAGGAUAGGCU mmu-miR-26a-5p mature miRNA

      - Thanks for the clarification provided. Nevertheless, I would note that measurements of the host transcript can be difficult to interpret. The processing of the hairpin by Drosha results in rapid decay of the reaming of the non-hairpin part, usually yielding very low expression levels. The mature levels of miR-26a-5p could be more accurate.

      (3) Similarly, the miRNA-seq expression from isomirdb suggests also that expression of miR-26b-5p is indeed 50-fold higher than miR-26b-3p in liver and blood. This difference in abundance of the two strands are usually regarded as one of them being the guide strand (in this case the 5p) and the other being the passenger (in this case the 3p). In some cases, passenger strands can be a byproduct of miRNA biogenesis, thus the rescue experiments using LNPs with both strands on equimolar amounts would not reflect the physiological abundance miR-26b-3p. The non-physiological over abundance of miR-26b-3p would constitute a source of undesired off-targets.

      - I agree with the authors that the functional data doesn't show evidence of undesired off-targets. Nevertheless, I would consider that for future studies. miRNA-phenotypes can be subtle in normal conditions and become more obvious on stressed conditions, the same might apply to off-target effects.

      (4) It would also be valuable to check the miRNA levels on the liver upon LNP treatment, or at least the signatures of miR-26b-3p and miR-26b-5p activity using RNA-seq on the RNA samples already collected.

      - Thanks for providing the miRNA quantification on the revised version of the manuscript.

      (5) Some of the phenotypes described, such as the increase in cholesterol, overlap with the previous publication van der Vorst et al. BMC Genom Data (2021), despite in this case the authors are doing their model in Apoe knock-out and Western-type diet. I would encourage the authors to investigate more or discuss why the initial phenotypes don't become more obvious despite the stressors added in the current manuscript.

      - Thanks for the clarification provided on your revised version of the manuscript.

      (6) The authors have focused part of their analysis on a few gene markers that show relatively modest changes. Deeper characterization using RNA-seq might reveal other genes that are more profoundly impacted by miR-26 depletion. It would strengthen the conclusions proposed if the authors validated that changes on mRNA abundance (Sra, Cd36) do impact the protein abundance. These relatively small changes or trends in mRNA expression, might not translate into changes in protein abundance.

      - Thanks for addressing this concern raised by R1 and R2.

      (7) In figures 5 and 7, the authors run a phosphorylation array (STK) to analyze the changes in the activity of the kinome. It seems that a relatively big number of signaling pathways are being altered, I think that should be strengthened by further validations by Western blot on the collected tissue samples. For quite a few of the kinases there might be antibodies that recognise phosphorylation. The two figures lack a mechanistic connection to the rest of the manuscript.

      - I appreciate the clarification provided by the authors regarding the difference between the activity assay and a Western blot for phosphorylated proteins. Is there any orthogonal technique to validate the PamGene activity assay available?

      Comments on revised version:

      The authors have addressed most of the changes suggested by R1 and R2.

    1. eLife Assessment

      This paper explores how diverse forms of inhibition impact firing rates in models for cortical circuits. In particular, the paper studies how the network operating point affects the balance of direct inhibition from SOM inhibitory neurons to pyramidal cells, and disinhibition from SOM inhibitory input to PV inhibitory neurons. This is an important issue as these two inhibitory pathways have largely been studied in isolation. A combination of analytical calculations and direct numerical simulations provides convincing evidence that the interplay of these inhibitory circuits can separately control network gain and stability.

    2. Reviewer #1 (Public review):

      Summary:

      This paper explores how diverse forms of inhibition impact firing rates in models for cortical circuits. In particular, the paper studies how the network operating point affects the balance of direct inhibition from SOM inhibitory neurons to pyramidal cells, and disinhibition from SOM inhibitory input to PV inhibitory neurons. This is an important issue as these two inhibitory pathways have largely been studies in isolation. A combination of analytical calculations and direct numerical simulations provide convincing evidence that the interplay of these inhibitory circuits can separately control network gain and stability.

      Strengths

      The paper has improved in revision, and the intuitive summary statements added to the end of each results section are quite helpful. The addition of numerical simulations to extend the conclusions beyond the linear range of network behavior are also quite helpful.

      Weaknesses

      None

    3. Reviewer #2 (Public review):

      Summary:

      Bos and colleagues address the important question of how two major inhibitory interneuron classes in the neocortex differentially affect cortical dynamics. They address this question by studying Wilson-Cowan-type mathematical models. Using a linearized fixed point approach, and subsequent simulations of neural circuits operating in the dynamic stochastically-driven regime, they provide compelling evidence that the existence of multiple interneuron classes can explain the counterintuitive finding that inhibitory modulation can increase the gain of the excitatory cell population while also increasing the stability of the circuit's state to minor perturbations. This effect depends on the connection strengths within their circuit model, providing important guidance as to when and why it arises.

      Overall, I find this study to have substantial merit. The authors have also done a commendable job of revising the paper in light of the critiques raised by myself and the other reviewers.

      Strengths:

      (1) The thorough investigation of how changes in the connectivity structure affect the gain-stability relationship is a major strength of this work. It provides an opportunity to understand when and why gain and stability will or will not both increase together. It also provides a nice bridge to the experimental literature, where different gain-stability relationships are reported from different studies.

      (2) The simplified and abstracted mathematical model has the benefit of facilitating our understanding of this puzzling phenomenon. It is not easy to find the right balance between biologically-detailed models vs simple but mathematically tractable ones, and I think the authors struck an excellent balance in this study.

      (3) While the fixed-point analysis has potentially substantial limitations for understanding cortical computations away from the steady-state, the authors used simulations to verify that their main findings hold in the stochastically-driven regime that more closely reflects the dynamics observed in in vivo neuroscience experiments.

      Weaknesses:

      (1) As the authors note in their Discussion, it would be worthwhile to study this effect in chaotic and/or oscillatory regimes, in addition to the ones they included here. I agree with their assessment that those investigations should be left for a future study.

      (2) The analysis is limited to paths within this simple E,PV,SOM circuit. This misses more extended paths (like thalamocortical loops) that involve interactions between multiple brain areas. Including those paths in the expansion in Eqs. 11-14 (Fig. 1C) may be an important direction for future work.

    4. Reviewer #3 (Public review):

      Summary:

      Bos et al study a computational model of cortical circuits with excitatory (E) and two subtypes of inhibition - parvalbumin (PV) and somatostatin (SOM) expressing interneurons. They perform stability and gain analysis of simplified models with nonlinear transfer functions when SOM neurons are perturbed. Their analysis suggests that in a specific setup of connectivity, instability and gain can be untangled, such that SOM modulation leads to both increase in stability and gain, in contrast to the typical direction in neuronal networks where increased gain results in decreased stability.

      Strengths:

      - Analysis of the canonical circuit in response to SOM perturbations. Through numerical simulations and mathematical analysis, the authors have provided a rather comprehensive picture of how SOM modulation may affect response changes.<br /> - Shedding light on two opposing circuit motifs involved in the canonical E-PV-SOM circuitry - namely, direct inhibition (SOM -> E) vs disinhibition (SOM -> PV -> E). These two pathways can lead to opposing effects, and it is often difficult to predict which one results from modulating SOM neurons. In simplified circuits, the authors show how these two motifs can emerge and depend on parameters like connection weights.<br /> - Suggesting potentially interesting consequences for cortical computation. The authors suggest that certain regimes of connectivity may lead to untangling of stability and gain, such that increases in network gain are not compromised by decreasing stability. They also link SOM modulation in different connectivity regimes to versatile computations in visual processing in simple models.

      Weaknesses:

      - Computationally, the analysis is solid, but it's very similar to previous studies (del Molino et al, 2017). Many studies in the past few years have done the perturbation analysis of a similar circuitry with or without nonlinear transfer functions (some of them listed in the references). This study applies the same framework to SOM perturbations, which is a useful computational analysis, in view of the complexity of the high-dimensional parameter space.<br /> - A general weakness of the paper is a lack of direct comparison to biological parameters or experiments. How different experiments can be reconciled by the results obtained here, and what new circuit mechanisms can be revealed? In its current form, the paper reads as a general suggestion that different combinations of gain modulation and stability can be achieved in a circuit model equipped with many parameters (12 parameters). This is potentially interesting but not surprising, given the high dimensional space of possible dynamical properties. A more interesting result would have been to relate this to biology, by providing reasoning why it might be relevant to certain circuits (and not others), or to provide some predictions or postdictions, which are currently not very strong in the manuscript.<br /> - Tuning curves are simulated for an individual orientation (same for all neurons), not considering the heterogeneity of neuronal networks with multiple orientation selectivity (and other visual features) - making the model too simplistic.

    1. eLife Assessment

      This is an important study showing that people who are hungry (vs. sated) put more weight on taste (vs. health) in their food choices. The experiment is well-designed and includes choice behavior, eye-tracking, and state-of-the-art computational modeling, resulting in compelling evidence supporting the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, the authors set out to understand how people's food decisions change when they are hungry vs. sated. To do so, they used an eye-tracking experiment where participants chose between two food options, each presented as a picture of the food plus its "Nutri-Score". In both conditions, participants fasted overnight, but in the sated condition, participants received a protein shake before making their decisions. The authors find that participants in the hungry condition were more likely to choose the tastier option. Using variants of the attentional drift diffusion model, they further find that the best fitting model has different attentional discounts on the taste and health attributes, and that the attentional discount on the health information was larger for the hungry participants.

      Strengths:

      The article has many strengths. It uses a food-choice paradigm that is established in neuroeconomics. The experiment uses real foods, with accurate nutrition information, and incentivized choices. The experimental manipulation is elegant in its simplicity - administering a high-calorie protein shake. It is also commendable that the study was within-participant. The experiment also includes hunger and mood ratings to confirm the effectiveness of the manipulation. The modeling work is impressive in its rigor - the authors test 8 different variants of the DDM, including recent models like the maaDDM, as well as some completely new variants (maaDDM2phi and 2phisp). The model fits decisively favor the maaDDM2phi.

      Weaknesses:

      While I do appreciate the within-participant design, it does raise a small concern about potential demand effects. The authors' results would have been more compelling if they had replicated when only analyzing the first session from each participant. However, the authors did demonstrate that there was no effect of order on the results, which helps to alleviate this concern.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the effect of fed vs hungry state on food decision making.

      70 participants performed a computerized food choice task with eye tracking. Food images came from a validated set with variability in food attributes. Foods ranged from low caloric density unprocessed (fruits) to high caloric density processed foods (chips and cookies).

      Prior to the choice task participants rated images for taste, health, wanting, and calories. In the choice task participants simply selected one of two foods. They were told to pick the one they preferred. Screens consisted of two food pictures along with their "Nutri-Score". They were told that one preferred food would be available for consumption at the end.

      A drift-diffusion model (DDM) was fit to the reaction time values. Eye tracking was used to measure dwell time on each part of the monitor.

      Findings: participants tended to select the item they had rated as "tastier", however, health also contributed to decisions.

      Strengths:

      The most interesting and innovative aspect of the paper is the use of the DDM models to infer from reaction time and choice the relative weight of the attributes.

      Were the ratings re-done at each session? E.g. were all tastiness ratings for the sated session made while sated? This is relevant as one would expect the ratings of tastiness and wanting to be affected by current fed state.

      Weaknesses:

      My main criticism, which doesn't affect the underlying results, is that the labeling of food choices as being taste- or health-driven is misleading. Participants were not cued to select health vs taste. Studies in which people were cued to select for taste vs health exist (and are cited here). Also, the label "healthy" is misleading, as here it seems to be strongly related to caloric density. A high-calorie food is not intrinsically unhealthy (even if people rate it as such). The suggestion that hunger impairs making healthy decisions is not quite the correct interpretation of the results here (even though everyone knows it to be true). Another interpretation is that hungry people in negative calorie balance simply prefer more calories.

      Comments on revisions: No further comments - all my questions addressed.

    4. Reviewer #3 (Public review):

      Summary:

      This well-powered study tested the effects of hunger on value-based dietary decision-making. The main hypothesis was that attentional mechanisms guide choices toward unhealthier and tastier options when participants are hungry, and are in the fasted state compared to satiated states. Participants were tested twice - in a fasted state and in a satiated state after consuming a protein shake. Attentional mechanisms were measured during dietary decision-making by linking food choices and reaction times to eye-tracking data and mathematical drift-diffusion models. The results showed that hunger makes high-conflict food choices more taste-driven and less health-driven. This effect was formally mediated by relative dwell time, which approximates attention drawn to chosen relative to unchosen options. Computational modeling showed that a drift-diffusion model, which assumed that food choices result from a noisy accumulation of evidence from multiple attributes (i.e., taste and health) and discounted non-looked attributes and options, best explained observed choices and reaction times.

      Strengths:

      This study's findings are valuable for understanding how energy states affect decision-making and provide an answer to how hunger can lead to unhealthy choices. These insights are relevant to psychology, behavioral economics, and behavioral change intervention designs.

      The study has a well-powered sample size and hypotheses were pre-registered. The analyses comprised classical linear models and non-linear computational modeling to offer insight into putative cognitive mechanisms.

      In summary the study advances the understanding of the links between energy states and value-based decision-making by showing that depleting is powerful for shaping the formation of food preferences. Moreover, the computational analysis part offers a plausible mechanistic explanation at the algorithmic level of observed effects.

      Weaknesses:

      Some parts of the positioning of the hunger state manipulation and the interpretation of its effects could be improved.

      On the positioning side, it does not seem like a 'bad' decision to replenish energy states when hungry by preferring tastier, more often caloric options. In this sense, it is unclear whether the observed behavior in the fasted state is a fallacy or a response to signals from the body. The introduction does mention these two aspects of preferring more caloric food when hungry. However, some ambiguity remains about whether the study results indeed reflect suboptimal choice behavior or a healthy adaptive behavior to restore energy stores.

      On the interpretation side, previous work has shown that beliefs about the nourishing and hunger-killing effectiveness of drinks or substances influence subjective and objective markers of hunger, including value-based dietary decision-making, and attentional mechanisms approximated by computational models and the activation of cognitive control regions in the brain. The present study shows differences between the protein shake and a natural history condition (fasted, state). This experimental design, however, cannot rule between alternative interpretations of observed effects. Notably, effects could be due to (a) the drink's active, nourishing ingredients, (b) to consuming a drink versus nothing, or (c) both.

      Comments on revisions:

      The authors addressed all my comments appropriately and I have no further requests. Thank you for the added discussion of findings and extra analyses.

    1. eLife assessment

      Using intracellular in vitro and in vivo recordings and a deep learning approach, this study shows that mouse dentate gyrus mossy cells (MCs) and CA3 pyramidal cells process information from an important electrophysiological hall mark of hippocampus, sharp wave-ripples (SWRs). The innovative use of deep learning to predict SWR waveforms from MC membrane potentials represents an interesting methodological advance. While the key findings are potentially fundamental, some of the evidence is currently incomplete and should be revised to better support the findings.

    2. Reviewer #1 (Public Review):

      The authors recorded from multiple mossy cells (MCs) of the dentate gyrus in slices or in vivo using anesthesia. They recorded MC spontaneous activity during spontaneous sharp waves (SWs) detected in area CA3 (in vitro) or in CA1 ( in vivo). They find variability of the depolarization of MCs in response to a SW. They then used deep learning to parse out more information. They conclude that CA3 sends different "information" to different MCs. However, this is not surprising because different CA3 neurons project to different MCs and it was not determined if every SW reflected the same or different subsets of CA3 activity.

      The strengths include recording up to 5 MCs at a time. The major concerns are in the finding that there is variability. This seems logical, not surprising. Also it is not clear how deep learning could lead to the conclusion that CA3 sends different "information" to different MCs. It seems already known from the anatomy because CA3 neurons have diverse axons so they do not converge on only one or a few MCs. Instead they project to different MCs. Even if they would, there are different numbers of boutons and different placement of boutons on the MC dendrites, leading to different effects on MCs. There also is a complex circuitry that is not taken into account in the discussion or in the model used for deep learning. CA3 does not only project to MCs. It also projects to hilar and other dentate gyrus GABAergic neurons which have complex connections to each other, MCs, and CA3. Furthermore, MCs project to MCs, the GABAergic neurons, and CA3. Therefore at any one time that a SW occurs, a very complex circuitry is affected and this could have very different effects on MCs so they would vary in response to the SW. This is further complicated by use of slices where different parts of the circuit are transected from slice to slice.

      It is also not discussed if SWs have a uniform frequency during the recording session. If they cluster, or if MC action potentials occur just before a SW, or other neurons discharge before, it will affect the response of the MC to the SW. If MC membrane potential varies, this will also effect the depolarization in response to the SW.

      In vivo, the SWs may be quite different than in vivo but this is not discussed. The circuitry is quite different from in vitro. The effects of urethane could have many confounding influences.

      Furthermore, how much the in vitro and in vivo SWs tell us about SWs in awake behaving mice is unclear.

      Also, methods and figures are hard to understand.

    3. Reviewer #2 (Public Review):

      • A summary of what the authors were trying to achieve<br /> Drawing from theoretical insights on the pivotal role of mossy cells (MCs) in pattern separation - a key process in distinguishing between similar memories or inputs - the authors investigated how MCs in the dentate gyrus of the hippocampus encode and process complex neural information. By recording from up to five MCs simultaneously, they focused on membrane potential dynamics linked to sharp wave-ripple complexes (SWRs) originating from the CA3 area. Indeed, using a machine learning approach, they were able to demonstrate that even a single MC's synaptic input can predict a significant portion (approximately 9%) of SWRs, and extrapolation suggested that synaptic input obtained from 27 MCs could account for 90% of the SWR patterns observed. The study further illuminates how individual MCs contribute to a distributed but highly specific encoding system. It demonstrates that SWR clusters associated with one MC seldom overlap with those of another, illustrating a precise and distributed encoding strategy across the MC network.

      • An account of the major strengths and weaknesses of the methods and results<br /> Strengths:<br /> (1) This study is remarkable because it establishes a critical link between the subthreshold activities of individual neurons and the collective dynamics of neuronal populations.<br /> (2) The authors utilize machine learning to bridge these levels of neuronal activity. They skillfully demonstrate the predictive power of membrane potential fluctuations for neuronal events at the population level and offer new insights into neuronal information processing.<br /> (3) To investigate sharp wave/ripple-related synaptic activity in mossy cells (MCs), the authors performed challenging experiments using whole-cell current-clamp recordings. These recordings were obtained from up to five neurons in vitro and from single mossy cells in live mice. The latter recordings are particularly valuable as they add to the limited published data on synaptic input to MCs during in vivo ripples.

      Weaknesses:<br /> (1) The model description could significantly benefit from additional details regarding its architecture, training, and evaluation processes. Providing these details would enhance the paper's transparency, facilitate replication, and strengthen the overall scientific contribution. For further details, please see below.<br /> (2) The study recognizes the concept of pattern separation, a central process in hippocampal physiology for discriminating between similar inputs to form distinct memories. The authors refer to a theoretical paper by Myers and Scharfman (2011) that links pattern separation with activity backpropagating from CA3 to mossy cells. Despite this initial citation, the concept is not discussed again in the context of the new findings. Given the significant role of MCs in the dentate gyrus, where pattern separation is thought to occur, it would be valuable to understand the authors' perspective on how their findings might relate to or contribute to existing theories of pattern separation. Could the observed functions of MCs elucidated in this study provide new insights into their contribution to processes underlying pattern separation?<br /> (3) Previous work concluded that sharp waves are associated with mossy cell inhibition, as evidenced by a consistent ripple function-related hyperpolarization of the membrane potential in these neurons when recorded at resting membrane potential (Henze & Buzsáki, 2007). In contrast, the present study reveals an SWR-induced depolarization of the membrane potential. Can the authors explain the observed modulation of the membrane potential during CA1 ripples in more detail? What was the proportion of cases of depolarization or hyperpolarization? What were the respective amplitude distributions? Were there cases of activation of the MCs, i.e., spiking associated with the ripple? This more comprehensive information would add significance to the study as it is not currently available in the literature.<br /> (4) In the study, the observation that mossy cells (MCs) in the lower (infrapyramidal) blade of the dentate gyrus (DG) show higher predictability in SWR patterns is both intriguing and notable. This finding, however, appears to be mentioned without subsequent in-depth exploration or discussion. One wonders if this observed predictability might be influenced by potential disruptions or severed connections inherent to the brain slice preparation method used. Furthermore, it prompts the question of whether similar observations or trends have been noted in MCs recorded in vivo, which could either corroborate or challenge this intriguing in vitro finding.<br /> (5) The study's comparison of SWR predictability by mossy cells (MCs) is complicated by using different recording sites: CA3 for in vitro and CA1 for in vivo experiments, as shown in Fig. 2. Since CA1-SWRs can also arise from regions other than CA3 (see e.g. Oliva et al., 2016, Yamamoto and Tonegawa, 2017), it is difficult to reconcile in vitro and in vivo results. Addressing this difference and its implications for MC predictability in the results discussion would strengthen the study.

      • An appraisal of whether the authors achieved their aims, and whether the results support their conclusions<br /> As outlined in the abstract and introduction, the primary aim is to investigate the role of MCs in encoding neuronal information during sharp wave ripple complexes, a crucial neuronal process involved in memory consolidation and information transmission in the hippocampus. It is clear from the comprehensive details in this study that the authors have meticulously pursued their goals by providing extensive experimental evidence and utilizing innovative machine learning techniques to investigate the encoding of information in the hippocampus by mossy cells (MCs). Together, this study provides a compelling account supported by rigorous experimental and analytical methods. Linking subthreshold membrane potentials and population activity by machine learning provides a comprehensive new analytic approach and sheds new light on the role of MCs in information processing in the hippocampus. The study not only achieves the stated goals, but also provides novel methodology, and valuable insights into the dynamics of neural coding and information flow in the hippocampus.

      • A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community<br /> Impact: Both the novel methodology and the provided biological insights will be of great interest to the community.<br /> Utility of methods/data: The applied deep learning approach will be of particular interest if the authors provide more details to improve its reproducibility (see related suggestions below).

    4. Reviewer #3 (Public Review):

      Compared to the pyramidal cells of the CA1 and CA3 regions of the hippocampus, and the granule cells of the dentate gyrus (DG), the computational role(s) of mossy cells of the DG have received much less attention over the years and are consequently not well understood. Mossy cells receive feedforward input from granule cells and feedback from CA3 cells. One significant factor is the compression of the large number of CA3 cells that input onto a much smaller population of mossy cells, which then send feedback connections to the granule cell layer. The present paper seeks to understand this compression in terms of neural coding, and asks whether the subthreshold activity of a small number of mossy cells can predict above chance levels the shapes of individual SWs produced by the CA3 cells. Using elegant multielectrode intracellular recordings of mossy cells, the authors use deep learning networks to show that they can train the network to "predict" the shape of a SW that preceded the intracellular activity of the mossy cells. Putatively, a single mossy cell can predict the shape of SWs above chance. These results are interesting, but there are some conceptual issues and questions about the statistical tests that must be addressed before the results can be considered convincing.

      Strengths<br /> (1) The paper uses technically challenging techniques to record from multiple mossy cells at the same time, while also recording SWs from the LFP of the CA3 layer. The data appear to be collected carefully and analyzed thoughtfully.<br /> (2) The question of how mossy cells process feedback input from CA3 is important to understand the role of this feedback pathway in hippocampal processing.<br /> (3) Given the concerns expressed below about proper statistical testing are resolved, the data appear supportive of the main conclusions of the authors and suggest that, to some degree, the much smaller population of mossy cells can conserve the information present in the larger population of CA3 cells, presumably by using a more compressed, dense population code.

      Weaknesses<br /> (4) Some of the statistical tests appear inappropriate because they treat each CA3 SW and associated Vm from a mossy cell as independent samples. This violates the assumptions of statistical tests such as the Kolmogorov-Smirnov tests of Figure 3C and Fig 3E. Although there is large variability among the SWs recorded and among the Vm's, they cannot be considered independent measurements if they derive from the same cell and same recording site of an individual animal. This becomes especially problematic when the number of dependent samples adds up to the tens of thousands, providing highly inflated numbers of samples that artificially reduce the p values. Techniques such as mixed-effects models are being increasingly used to factor out the effects of within cell and within animal correlations in the data. The authors need to do something similar to factor out these contributions in order to perform statistical tests, throughout the manuscript when this problem occurs.<br /> (5) A separate statistical problem occurs when comparing real data against a shuffled, surrogate data set. From the methods, I gather that Figure 3C combined data from 100 surrogate shuffles to compare to the real data. It is inappropriate to do a classic statistical test of data against such shuffles, because the number of points in the pooled surrogate data sets are not true samples from a population. It is a mathematical certainty that one can eventually drive a p value to < 0.05 just by increasing the number of shuffles sufficiently. Thus, the p value is determined by the number of computer shuffles allowed by the time and processing power of a computer, rather than by sampling real data from the population. Figures such as 4C and 5A are examples that test data against shuffle appropriately, as a single value is determined to be within or outside the 95% confidence interval of the shuffle, and this determination is not directly affected by the number of shuffles performed.<br /> (6) The last line of the Discussion states that this study provides "important insights into the information processing of neural circuits at the bottleneck layer," but it is not clear what these insights are. If the statistical problems are addressed appropriately, then the results do demonstrate that the information that is reflected in SWs can be reconstructed by cells in the MC bottleneck, but it is not certain what conceptual insights the authors have in mind. They should discuss more how these results further our understanding of the function of the feedback connection from CA3 to the mossy cells, discuss any limitations on their interpretation from recording LFPs rather than the single-unit ensemble activity (where the information is really encoded).<br /> 7) In Figure 1C, the maximum of the MC response on the first inset precedes the SW, and the onset of the Vm response may be simultaneous with SW. This would suggest that the SW did not drive the mossy cell, but this was a coincident event. How many SW-mossy cell recordings are like this? Do the authors have a technical reason to believe that these are events in which the mossy cell is driven by the CA3 cells active during the SW?

    5. Author response:

      Reviewer #1 (Public Review):

      We are grateful to this reviewer for her/his constructive comments, which have greatly improved our work. Individual responses are provided below.

      The authors recorded from multiple mossy cells (MCs) of the dentate gyrus in slices or in vivo using anesthesia. They recorded MC spontaneous activity during spontaneous sharp waves (SWs) detected in area CA3 (in vitro) or in CA1 ( in vivo). They find variability of the depolarization of MCs in response to a SW. They then used deep learning to parse out more information. They conclude that CA3 sends different "information" to different MCs. However, this is not surprising because different CA3 neurons project to different MCs and it was not determined if every SW reflected the same or different subsets of CA3 activity.

      Thank you for your valuable comments. We agree that our finding that different MCs receive different information is unsurprising. These data are, in fact, to be expected from the anatomical knowledge of the circuit structure. However, as a physiological finding, there is a certain value in proving this fact; please note that it was not clear whether the neural activity of individual MCs received heterogeneous/variable information at the physiological level. It was therefore necessary to investigate this by recording neural activity. We believe this study is important because it quantitatively demonstrates this fact.

      The strengths include recording up to 5 MCs at a time. The major concerns are in the finding that there is variability. This seems logical, not surprising. Also it is not clear how deep learning could lead to the conclusion that CA3 sends different "information" to different MCs. It seems already known from the anatomy because CA3 neurons have diverse axons so they do not converge on only one or a few MCs. Instead they project to different MCs. Even if they would, there are different numbers of boutons and different placement of boutons on the MC dendrites, leading to different effects on MCs. There also is a complex circuitry that is not taken into account in the discussion or in the model used for deep learning. CA3 does not only project to MCs. It also projects to hilar and other dentate gyrus GABAergic neurons which have complex connections to each other, MCs, and CA3. Furthermore, MCs project to MCs, the GABAergic neurons, and CA3. Therefore at any one time that a SW occurs, a very complex circuitry is affected and this could have very different effects on MCs so they would vary in response to the SW. This is further complicated by use of slices where different parts of the circuit are transected from slice to slice.

      The first half of this paragraph is closely related to the previous paragraph. We propose that the variation in membrane potential of the simultaneously recorded MCs allows for the expression of diverse information. We also believe that this is highly novel in that no previous work has described the extent to which SWR is encoded in MCs. Our study proposes a new quantitative method that relates two variables (LFP and membrane potential) that are inherently incomparable. Specifically, we used machine learning (please note that it is a neural network, but not "deep learning") to achieve this quantification, and we believe this innovation is noteworthy.

      In the latter part of this article, you raise another important point. First, we would like to point out that this comment contains a slight misunderstanding. Our goal is not to reproduce the circuit structure of the hippocampus in silico but to propose a "function (or mapping/transformation)" that connects the two different modalities, i.e., LFP and Vm. This function should be as simple as possible, which is desirable from an explanatory point of view. In this respect, our machine learning model is a 'perceptron'-like 3-layer neural network. One of the simplest classical neural network models can predict the LFP waveform from Vm, which is quite surprising and an achievement we did not even imagine before. The fact that our model does not consider dendrites or inhibitory neurons is not a drawback but an important advantage. On the other hand, the fact that the data we used for our predictions were primarily obtained using slice experiments may be a drawback of this study, and we agree with your comments. However, we can argue that the new quantitative method we propose here is versatile since we showed that the same machine learning can be used to predict in vivo single-cell data.

      It is also not discussed if SWs have a uniform frequency during the recording session. If they cluster, or if MC action potentials occur just before a SW, or other neurons discharge before, it will affect the response of the MC to the SW. If MC membrane potential varies, this will also effect the depolarization in response to the SW.

      Thank you for raising an important point. We have done some additional analyses in response to your comment. First, we plotted how the SWR parameter fluctuated during our recording time (especially for data recorded for long periods of more than 5 minutes). As shown in the new Figure 1 - figure supplement 4, we can see that the frequency of SWRs was kept uniform during the recording time. These data ensure the rationale for pooling data over time.

      We also calculated the average membrane potentials of MCs before and after SWRs and found that MCs did not show depolarization or hyperpolarization before SWs, unlike Vm of CA1 neurons. These data indicate that the surrounding circuitry was not particularly active before SW, eliminating any concern that such unexpected preceding activity might affect our analysis. These data are shown in Figure 1 - figure supplement 2.

      In vivo, the SWs may be quite different than in vivo but this is not discussed. The circuitry is quite different from in vitro. The effects of urethane could have many confounding influences. Furthermore, how much the in vitro and in vivo SWs tell us about SWs in awake behaving mice is unclear.

      We agree with this point. Ideally, recording in vitro and in vivo under conditions as similar as possible would be optimal. However, as you know, patch-clamp recording from mossy cells in vivo is technically challenging, and currently, there is no alternative to conducting experiments under anesthesia. We believe that science advances not merely through theoretical discourse, but by contributing empirical data collected under existing conditions. However, as we mentioned in the paper, we believe that in vivo and in vitro SWR share some properties and a common principle of occurrence. We also observed that there are similar characteristics in the membrane potential response of MC to SWR. However, as you have pointed out, data derived from these limitations require careful interpretation, and we have explicitly stated in the paper that not only are there such problems, but that there are also common properties in the data obtained in vivo and in vitro (Page 12, Line 357).

      Also, methods and figures are hard to understand as described below.

      Thank you for all your comments. We have carefully considered the reviewers' comments and improved the text and legend. We hope you will take the time to review them.

      Reviewer #2 (Public Review):

      Thank you for the positive evaluations, which have encouraged us to resubmit this manuscript. We have revised our manuscript in accordance with your comments. Our point-by-point responses are as follows:

      • A summary of what the authors were trying to achieve

      Drawing from theoretical insights on the pivotal role of mossy cells (MCs) in pattern separation - a key process in distinguishing between similar memories or inputs - the authors investigated how MCs in the dentate gyrus of the hippocampus encode and process complex neural information. By recording from up to five MCs simultaneously, they focused on membrane potential dynamics linked to sharp wave-ripple complexes (SWRs) originating from the CA3 area. Indeed, using a machine learning approach, they were able to demonstrate that even a single MC's synaptic input can predict a significant portion (approximately 9%) of SWRs, and extrapolation suggested that synaptic input obtained from 27 MCs could account for 90% of the SWR patterns observed. The study further illuminates how individual MCs contribute to a distributed but highly specific encoding system. It demonstrates that SWR clusters associated with one MC seldom overlap with those of another, illustrating a precise and distributed encoding strategy across the MC network.

      We appreciate that this reviewer found scientific value in our manuscript. Thanks to the comments, we were pleased to be able to revise and improve the manuscript. Individual responses are listed below:

      • An account of the major strengths and weaknesses of the methods and results

      Strengths:

      (1) This study is remarkable because it establishes a critical link between the subthreshold activities of individual neurons and the collective dynamics of neuronal populations.

      (2) The authors utilize machine learning to bridge these levels of neuronal activity. They skillfully demonstrate the predictive power of membrane potential fluctuations for neuronal events at the population level and offer new insights into neuronal information processing.

      (3) To investigate sharp wave/ripple-related synaptic activity in mossy cells (MCs), the authors performed challenging experiments using whole-cell current-clamp recordings. These recordings were obtained from up to five neurons in vitro and from single mossy cells in live mice. The latter recordings are particularly valuable as they add to the limited published data on synaptic input to MCs during in vivo ripples.

      We appreciate the reviewer’s critical evaluations, which have encouraged us to revise and resubmit this manuscript. We have revised our manuscript in line with the reviewer’s comments. Our point-by-point responses are provided below:

      Weaknesses:

      (1) The model description could significantly benefit from additional details regarding its architecture, training, and evaluation processes. Providing these details would enhance the paper's transparency, facilitate replication, and strengthen the overall scientific contribution. For further details, please see below.

      Thank you for the suggestions. We have responded with model details based on the following comments.

      (2) The study recognizes the concept of pattern separation, a central process in hippocampal physiology for discriminating between similar inputs to form distinct memories. The authors refer to a theoretical paper by Myers and Scharfman (2011) that links pattern separation with activity backpropagating from CA3 to mossy cells. Despite this initial citation, the concept is not discussed again in the context of the new findings. Given the significant role of MCs in the dentate gyrus, where pattern separation is thought to occur, it would be valuable to understand the authors' perspective on how their findings might relate to or contribute to existing theories of pattern separation. Could the observed functions of MCs elucidated in this study provide new insights into their contribution to processes underlying pattern separation?

      Thank you for your valuable comment. The role of MCs in pattern separation is described in the discussion as follows:

      “It has been shown through theoretical models that MCs are a contributor to pattern separation (Myers and Scharfman, 2011). In general, the pathway of neural information is diverged from the entorhinal cortex through the larger granule cell layer and then compressed into the smaller CA3 cell layer. In this case, there is a high possibility of information loss during the transmission process. Thus, a backprojection mechanism via MCs has been proposed as a device to prevent information loss. Indeed, in theoretical models, such backprojection improves pattern separation and memory capacity, and the results are closer to experimental data than models without built-in backprojection. However, it was unclear what information individual MCs receive during backprojection. Our results show that CA3 SWR is distributed and encoded in the MC population, and that even though the number of MCs is smaller than in other regions, it is possible to reproduce about 30% of the SWR in CA3 from the membrane potential of only five MCs. Based on these results, it is believed that MCs not only play a role in preventing information loss, but also play a role in receiving some kind of newly encoded memory information in the CA3 region, and it is highly likely that the information contained in the backprojections is different from the neural information transmitted through conventional transmission pathways. Indeed, the fact that the information replayed in CA3 is reflected as SWR and propagated to each brain region suggests that the newly encoded memory information in CA3 is propagated to MC. If  backprojection simply returned the information transmitted from DG to CA3, and to MC, this would be unrealistic and extremely inefficient. However, it is still unclear what kind of memory information is actually backprojected and distributed to the MC, and how it differs from the memory information transmitted in the forward direction. These are open questions that need to be addressed in future experiments in awake animals.” (Page 11, Line 333)

      (3) Previous work concluded that sharp waves are associated with mossy cell inhibition, as evidenced by a consistent ripple function-related hyperpolarization of the membrane potential in these neurons when recorded at resting membrane potential (Henze & Buzsáki, 2007). In contrast, the present study reveals an SWR-induced depolarization of the membrane potential. Can the authors explain the observed modulation of the membrane potential during CA1 ripples in more detail? What was the proportion of cases of depolarization or hyperpolarization? What were the respective amplitude distributions? Were there cases of activation of the MCs, i.e., spiking associated with the ripple? This more comprehensive information would add significance to the study as it is not currently available in the literature.

      Sorry for confusing the conclusion. First, we did not mention in the paper that in vivo MC depolarized during SWR. The following sentences have added to result:

      “Previous research has shown that the hyperpolarization of MC membrane potential associated with SWR indicates that SWR is related to the inhibition of mossy cells (Henze and Buzsáki, 2007). However, our data showed that the proportion of cases of depolarization or hyperpolarization was about the same, with a slight excess of depolarization. However, it should be noted that MCs are highly active and fluctuating cells, and the determination of whether they are depolarized or hyperpolarized is highly dependent on the method of analysis. Moreover, the firing rate of MCs that we recorded was 1.07 ± 0.93 Hz (mean ± SD from 6 cells, 6 mice), and 6.68 ± 4.79% (mean ± SD from 6 cells, 6 mice, n = 757 SWR events) of all SWRs recruited MC firing (calculated as firing within 50 ms after the SWR peak). ” (Page 5, Line 143)

      (4) In the study, the observation that mossy cells (MCs) in the lower (infrapyramidal) blade of the dentate gyrus (DG) show higher predictability in SWR patterns is both intriguing and notable. This finding, however, appears to be mentioned without subsequent in-depth exploration or discussion. One wonders if this observed predictability might be influenced by potential disruptions or severed connections inherent to the brain slice preparation method used. Furthermore, it prompts the question of whether similar observations or trends have been noted in MCs recorded in vivo, which could either corroborate or challenge this intriguing in vitro finding.

      As you pointed out, one cannot rule out the possibility that this predictability may be influenced by potential disruptions or disconnections inherent in the methods used to prepare the acute slices. And the number of cells is limited to six with respect to the anatomical location of the MC recorded in vivo, making SWR and MC patch clamp recording very difficult even under anesthesia. Therefore, it is difficult to find statistical significance in the current data. We have added following text in Discussion:

      “In addition, the finding that SWR is more predictive when the recorded location of the MC is near the lower blade of the DG is unexpected, so the possibility that this result is influenced by potential disruptions or severed connections during the preparation of the acute slice cannot be ruled out.” (Page 14, Line 405)

      (5) The study's comparison of SWR predictability by mossy cells (MCs) is complicated by using different recording sites: CA3 for in vitro and CA1 for in vivo experiments, as shown in Fig. 2. Since CA1-SWRs can also arise from regions other than CA3 (see e.g. Oliva et al., 2016, Yamamoto and Tonegawa, 2017), it is difficult to reconcile in vitro and in vivo results. Addressing this difference and its implications for MC predictability in the results discussion would strengthen the study.

      Thank you for your comment. We have added the following discussion to your comment:

      “In this study, we performed MC patch-clamp recording both in vivo and in vitro, and clarified that SWR can be predicted from V_m of MC in both cases. However, there are three caveats to the interpretation of these data. First, the _in vivo SWR cannot be said to be exactly the same as the in vitro SWR: note that in vitro SWR has some similarities to in vivo SWR, such as spatial and spectral profiles and neural activity patterns (Maier et al., 2009; Hájos et al., 2013; Pangalos et al., 2013). The same concern applies to MC synaptic inputs. The in vivo V_m data may contain more information compared to the _in vitro single MC data, because the entire projections that target MCs are intact, resulting in a complete set of synaptic inputs related to SWR activity, as opposed to slices where connections are severed. While we recognize these differences, it is also very likely that there are common ways of expressing information. Second, since the in vivo LFP recordings were obtained from the CA1 region, it is possible that the CA1-SWR receives input from the CA2 region (Oliva et al., 2016) and the entorhinal cortex (Yamamoto and Tonegawa, 2017). In addition, urethane anesthesia has been observed to reduce subthreshold activity, spike synchronization, and SWR (Yagishita et al., 2020), making it difficult to achieve complete agreement with in vitro SWR recorded from the CA3 region. Finally, although we were able to record MC V_m during _in vivo SWR in this study, the in vivo data set consisted of recordings from a single MC, in contrast to the in vitro dataset. To perform the same analysis as in the in vitro experiment, it would be desirable to record LFPs from the CA3 region and collect data from multiple MCs simultaneously, but this is technically very difficult. In this study, it was difficult to directly clarify the consistency between CA3 network activity and in vivo MC synaptic input, but the fact that the SWR waveform can be predicted from in vivo MC V_m in CA1-SWR may be the result of some CA3 network activity being reflected in CA1-SWR. It is undeniable that more accurate predictions would have been possible if it had been possible to record LFP from the CA3 regions _in vivo. ” (Page 12, Line 357)

      • An appraisal of whether the authors achieved their aims, and whether the results support their conclusions

      As outlined in the abstract and introduction, the primary aim is to investigate the role of MCs in encoding neuronal information during sharp wave ripple complexes, a crucial neuronal process involved in memory consolidation and information transmission in the hippocampus. It is clear from the comprehensive details in this study that the authors have meticulously pursued their goals by providing extensive experimental evidence and utilizing innovative machine learning techniques to investigate the encoding of information in the hippocampus by mossy cells (MCs). Together, this study provides a compelling account supported by rigorous experimental and analytical methods. Linking subthreshold membrane potentials and population activity by machine learning provides a comprehensive new analytic approach and sheds new light on the role of MCs in information processing in the hippocampus. The study not only achieves the stated goals, but also provides novel methodology, and valuable insights into the dynamics of neural coding and information flow in the hippocampus.

      We appreciate the reviewer’s critical evaluations, which have encouraged us to revise and resubmit this manuscript. We have revised our manuscript in line with the reviewer’s comments.

      • A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community

      Impact: Both the novel methodology and the provided biological insights will be of great interest to the community.

      Utility of methods/data: The applied deep learning approach will be of particular interest if the authors provide more details to improve its reproducibility (see related suggestions below).

      We appreciate that this reviewer found scientific value in our manuscript. Thanks to the comments.

      Reviewer #3 (Public Review):

      We appreciate that this reviewer raised several important issues. We are pleased to have been able to revise the paper into a better manuscript based on these comments. Individual responses are listed below:

      Compared to the pyramidal cells of the CA1 and CA3 regions of the hippocampus, and the granule cells of the dentate gyrus (DG), the computational role(s) of mossy cells of the DG have received much less attention over the years and are consequently not well understood. Mossy cells receive feedforward input from granule cells and feedback from CA3 cells. One significant factor is the compression of the large number of CA3 cells that input onto a much smaller population of mossy cells, which then send feedback connections to the granule cell layer. The present paper seeks to understand this compression in terms of neural coding, and asks whether the subthreshold activity of a small number of mossy cells can predict above chance levels the shapes of individual SWs produced by the CA3 cells. Using elegant multielectrode intracellular recordings of mossy cells, the authors use deep learning networks to show that they can train the network to "predict" the shape of a SW that preceded the intracellular activity of the mossy cells. Putatively, a single mossy cell can predict the shape of SWs above chance. These results are interesting, but there are some conceptual issues and questions about the statistical tests that must be addressed before the results can be considered convincing.

      We appreciate that this reviewer found scientific value in our manuscript. Thanks to the comments, we were pleased to be able to revise and improve the manuscript. Individual responses are listed below:

      Strengths

      (1) The paper uses technically challenging techniques to record from multiple mossy cells at the same time, while also recording SWs from the LFP of the CA3 layer. The data appear to be collected carefully and analyzed thoughtfully.

      (2) The question of how mossy cells process feedback input from CA3 is important to understand the role of this feedback pathway in hippocampal processing.

      3) Given the concerns expressed below about proper statistical testing are resolved, the data appear supportive of the main conclusions of the authors and suggest that, to some degree, the much smaller population of mossy cells can conserve the information present in the larger population of CA3 cells, presumably by using a more compressed, dense population code.

      We appreciate the reviewer’s critical evaluations, which have encouraged us to revise and resubmit this manuscript. We have revised our manuscript in line with the reviewer’s comments. Our point-by-point responses are provided below:

      Weaknesses

      4) Some of the statistical tests appear inappropriate because they treat each CA3 SW and associated Vm from a mossy cell as independent samples. This violates the assumptions of statistical tests such as the Kolmogorov-Smirnov tests of Figure 3C and Fig 3E. Although there is large variability among the SWs recorded and among the Vm's, they cannot be considered independent measurements if they derive from the same cell and same recording site of an individual animal. This becomes especially problematic when the number of dependent samples adds up to the tens of thousands, providing highly inflated numbers of samples that artificially reduce the p values. Techniques such as mixed-effects models are being increasingly used to factor out the effects of within cell and within animal correlations in the data. The authors need to do something similar to factor out these contributions in order to perform statistical tests, throughout the manuscript when this problem occurs.

      Thank you for the insightful comment. As for the correlation between the animals, since they were brought in at the same age and kept in the same environment, we do not think it is necessary to account for the differences due to environmental factors. As the reviewer pointed out, we cannot completely rule out the possibility that within cell or within animal correlation might influence the results, so we plotted the differences in prediction accuracy between cells, slices, and animals (Figure 3 - figure supplement 7). The results showed that prediction accuracy of the real data was better than that of the shuffled data in 66 of the 87 MCs (75.9%). In response to the comment that measurements from the same animal do not constitute independent samples, we have indicated that the average ΔRMSE for each mouse were calculated and these values were significantly different from 0 (n = 14, *p = 0.0041, Student’s t-test). In other words, even if each animal is considered an independent sample, it is possible to obtain statistically significant differences.

      5) A separate statistical problem occurs when comparing real data against a shuffled, surrogate data set. From the methods, I gather that Figure 3C combined data from 100 surrogate shuffles to compare to the real data. It is inappropriate to do a classic statistical test of data against such shuffles, because the number of points in the pooled surrogate data sets are not true samples from a population. It is a mathematical certainty that one can eventually drive a p value to < 0.05 just by increasing the number of shuffles sufficiently. Thus, the p value is determined by the number of computer shuffles allowed by the time and processing power of a computer, rather than by sampling real data from the population. Figures such as 4C and 5A are examples that test data against shuffle appropriately, as a single value is determined to be within or outside the 95% confidence interval of the shuffle, and this determination is not directly affected by the number of shuffles performed.

      Thank you for raising a very good point. We understand the reviewer's comments, but we cannot fully agree with the part that says "It is mathematical certainty that one can eventually drive a p value to < 0.05 just by increasing the number of shuffles sufficiently". This is because when comparing data with no difference at all, no amount of shuffling will produce a significant difference. In this regard, we agree that increasing the number of shuffles will lower the p-value when comparing data with even a small difference. Based on the reviewer's comments, we used a paired t-test to test whether the difference between RMSEreal and RMSEsurrogate was significantly different from 0, and showed it was significantly different (Figure 3 - figure supplement 5). Even when a paired t-test was used for the test, as in Figure 3E, a significant difference in the prediction error of the real and shuffled data was observed for all MC number inputs and also for the in vivo data.

      6) The last line of the Discussion states that this study provides "important insights into the information processing of neural circuits at the bottleneck layer," but it is not clear what these insights are. If the statistical problems are addressed appropriately, then the results do demonstrate that the information that is reflected in SWs can be reconstructed by cells in the MC bottleneck, but it is not certain what conceptual insights the authors have in mind. They should discuss more how these results further our understanding of the function of the feedback connection from CA3 to the mossy cells, discuss any limitations on their interpretation from recording LFPs rather than the single-unit ensemble activity (where the information is really encoded).

      Thank you for your insightful comment. We have added the following text to the discussion:

      “Given that different SWRs may encode information that correlates with different experiences, it is also possible that the activity of individual MCs may play a role in encoding different experiences via SWRs. Indeed, several in vivo studies have confirmed that MC activity is involved in the space encoding (Bui et al., 2018; Huang et al., 2024). However, the relationship with SWRs has not been investigated. The significance of the fact that the SWR recorded from CA3 is reflected in the MC as synaptic input is that it not only shows the transmission pathway from CA3 to MC, but also reveals the information below the threshold that leads to firing, and in a broad sense, it approaches the mechanism by which information processing by neuronal firing. And the expression of synaptic input to the MC is not uniform, but varies in a variety of ways according to the pattern of SWR. Based on previous research showing that diversity is important for information representation (Padmanabhan and Urban, 2010; Tripathy et al., 2013), it is possible that this heterogeneity in membrane potential levels, rather than the all-or-none output of neuronal firing activity, is the key to encoding more precise information. In this respect, our research, which focuses on information encoding at the subthreshold level, may be able to extract even more information than information encoded by firing activity. ” (Page 14, Line 419)

      7) In Figure 1C, the maximum of the MC response on the first inset precedes the SW, and the onset of the Vm response may be simultaneous with SW. This would suggest that the SW did not drive the mossy cell, but this was a coincident event. How many SW-mossy cell recordings are like this? Do the authors have a technical reason to believe that these are events in which the mossy cell is driven by the CA3 cells active during the SW?

      Thank you for your insightful comment. Based on your comment, we have aligned all the MC EPSPs for each SWR onset and found that the EPSPs rise after the SWR onset (Figure 1 - figure supplement 2). This leads us to believe that the EPSP of the MC is most likely driven by the SWR.

    1. eLife Assessment

      The authors describe an approach to construct hybrid neuraminidase molecules that express epitopes (loops) of a specific neuraminidase grafted onto another neuraminidase. The loops (epitopes) are from low-expressing neuraminidases and the scaffold is derived from a high-expressing neuraminidase. This paper is an important contribution giving new insights into the structure, function, and immunogenicity of influenza virus neuraminidases. The paper presents convincing evidence supporting the conclusions arrived at by the authors.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript described a structure-guided approach to graft important antigenic loops of the neuraminidase to a homotypic but heterologous NA. This approach allows the generation of well-expressed and thermostable recombinant proteins with antigenic epitopes of choice to some extent. The loop-grafted NA was designated hybrid.

      Strengths:

      The hybrid NA appeared to be more structurally stable than the loop-donor protein while acquiring its antigenicity. This approach is of value when developing a subunit NA vaccine which is difficult to express. So that antigenic loops could be potentially grafted to a stable NA scaffold to transfer strain-specific antigenicity.

      Weaknesses:

      However, major revisions to better organize the text, and figure and make clarifications on a number of points, are needed. There are a few cases in which a later figure was described first, data in the figures were not sufficiently described, or where there were mismatched references to figures.

      More importantly, the hybrid proteins did not show any of the advantages over the loop-donor protein in the format of VLP vaccine in mouse studies, so it's not clear why such an approach is needed to begin with if the original protein is doing fine.

    3. Reviewer #2 (Public review):

      In their manuscript, Rijal and colleagues describe a 'loop grafting' strategy to enhance expression levels and stability of recombinant neuraminidase. The work is interesting and important, but there are several points that need the author's attention.

      Major points

      (1) The authors overstress the importance of the epitopes covered by the loops they use and play down the importance of antibodies binding to the side, the edges, or the underside of the NA. A number of papers describing those mAbs are also not included.

      (2) The rationale regarding the PR8 hybrid is not well described and should be described better.

      (3) Figure 3B and 6C: This should be given as numbers (quantified), not as '+'.

      (4) Figure 5A and 7A: Negative controls are missing.

      (5) The authors claim that they generate stable tetramers. Judging from SDS-PAGE provided in Supplementary Figure 3B (BS3-crosslined), many different species are present including monomers, dimers, tetramers, and degradation products of tetramers. In line 7 for example there are at least 5 bands.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript described a structure-guided approach to graft important antigenic loops of the neuraminidase to a homotypic but heterologous NA. This approach allows the generation of well-expressed and thermostable recombinant proteins with antigenic epitopes of choice to some extent. The loop-grafted NA was designated hybrid.

      Strengths:

      The hybrid NA appeared to be more structurally stable than the loop-donor protein while acquiring its antigenicity. This approach is of value when developing a subunit NA vaccine which is difficult to express. So that antigenic loops could be potentially grafted to a stable NA scaffold to transfer strain-specific antigenicity.

      Weaknesses:

      However, major revisions to better organize the text, and figure and make clarifications on a number of points, are needed. There are a few cases in which a later figure was described first, data in the figures were not sufficiently described, or where there were mismatched references to figures.

      More importantly, the hybrid proteins did not show any of the advantages over the loop-donor protein in the format of VLP vaccine in mouse studies, so it's not clear why such an approach is needed to begin with if the original protein is doing fine.

      We thank the reviewer for their helpful comments. We have incorporated feedback from the authors to improve the manuscript. Please see our point-by-point response.

      The purpose of loop-grafting between H5N1/2021 (a high-expressor) and the PR8 virus was not to improve the expression of PR8, which is already a good expressing NA. Instead, the loop-grafting and the in vivo experiments were done to show the loop-specific protection following a lethal PR8 virus challenge.

      Reviewer #2 (Public review):

      In their manuscript, Rijal and colleagues describe a 'loop grafting' strategy to enhance expression levels and stability of recombinant neuraminidase. The work is interesting and important, but there are several points that need the author's attention.

      Major points

      (1) The authors overstress the importance of the epitopes covered by the loops they use and play down the importance of antibodies binding to the side, the edges, or the underside of the NA. A number of papers describing those mAbs are also not included.

      We have discussed the distribution of epitopes on NA molecule in the Discussion section "The distribution of epitopes in neuraminidase" (new line number 350). In Supplementary Figures 1 and 2, we have compiled the epitopes reported by polyclonal sera and mAbs via escape virus selection or crystal structural studies. There are 45 residues examples of escape virus selection, and we found that approximately 90% of the epitopes are located within the top loops (Loops 01 and Loops 23, which include the lateral sides and edges of NA). We have also included the epitopes of underside mAbs NDS.1 and NDS.3 in Supplementary Figure 2. Some of the interactions formed by these mAbs are also within the L01 and L23 loops. All relevant references are cited in Supplementary Figures 1 and 2.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      (2) The rationale regarding the PR8 hybrid is not well described and should be described better.

      We described the rationale for the PR8 hybrid (new lines 247-250). For clarity, we have added the following sentence within the section "Loop transfer between two distant N1 NAs:...."

      (new lines 255-258):

      "mSN1 showed sufficient cross-reactivity to N1/09 to protect mice against virus challenge. Therefore, we performed loop transfer between mSN1 and PR8N1, which differ by 18 residues within the L01 and L23 loops and show no or minimal cross-reactivity, to assess the loop-specific protection."

      (3) Figure 3B and 6C: This should be given as numbers (quantified), not as '+'.

      We have included the numerical data in Supplementary Figure 6. The data is presented in semi-quantitative manner for simplification. To improve clarity, we have now added the following sentence to the Figure 3c legend: "Refer to Supplementary Figure 6 for binding titration data".

      (4) Figure 5A and 7A: Negative controls are missing.

      A pool of Empty VLP sera was included as a negative control, showing no inhibition at 1:40 dilution. In the figure legends, we have stated "Pooled sera to unconjugated mi3 VLP was negative control and showed no inhibition at 1:40 dilution (not included in the graphs)"

      (5) The authors claim that they generate stable tetramers. Judging from SDS-PAGE provided in Supplementary Figure 3B (BS3-crosslinked), many different species are present including monomers, dimers, tetramers, and degradation products of tetramers. In line 7 for example there are at least 5 bands.

      Tetrameric conformation of soluble proteins is evidenced by the size-exclusion chromatographs shown in Figures 3a and 6b. The BS3 crosslinked SDS-PAGE are only suggestive data, indicating that the protein is a tetramer if a band appears at ~250 kDa. However, depending on the reaction conditions, lower molecular weight bands may also be observed if crosslinking is incomplete.

    1. eLife Assessment

      This manuscript presents a potentially important strategy for stimulating mammalian Müller glia to proliferate in vivo by manipulating cell cycle components. The results are convincing that a large number of Müller glia can be induced to re-enter the cell cycle without a damage stimulus. These findings are likely to appeal to retinal biologists and neuroscientists in general.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Wu et al. introduce a novel approach to reactivate the Muller glia cell cycle in the mouse retina by simultaneously reducing p27Kip1 and increasing cyclin D1 using a single AAV vector. The approach effectively promotes Muller glia proliferation and reprogramming without disrupting retinal structure or function. Interestingly, reactivation of the Muller glia cell cycle downregulates IFN pathway, which may contribute to the induced retinal regeneration. The results presented in this manuscript may offer a promising approach for developing Müller glia cell-mediated regenerative therapies for retinal diseases.

      Comments on revisions:

      The authors have revised the manuscript and addressed my concerns.

    3. Reviewer #2 (Public review):

      This manuscript by Wu, Liao et al. reports that simultaneous knockdown of P27Kip1 with overexpression of Cyclin D can stimulate Muller glia to re-enter the cell cycle in the mouse retina. There is intense interest in reprogramming mammalian muller glia into a source for neurogenic progenitors, in the hopes that these cells could be a source for neuronal replacement in neurodegenerative diseases. Previous work in the field has shown ways in which mouse Muller glia can be neurogenically reprogrammed and these studies have shown cell cycle re-entry prior to neurogenesis. In other works, typically, the extent of glial proliferation is limited, and the authors of this study highlight the importance of stimulating large numbers of Muller glia to re-enter the cell cycle with the hopes they will differentiate into neurons.

      The authors have satisfactorily responded to all my previous reviewer comments. The authors have significantly improved their imaging quality in Figure 1 and 4. The authors have admirably re-considered their FISH and scRNA-seq data and performed critical control experiments. They now provide a more nuanced interpretation of their data by removing reference to MG-inducing rod genes which is now interpreted as ambient contamination. Taken together, this manuscript now provides strong evidence of a viral way to induce large numbers of MG to re-enter the cell cycle without a damage stimulus.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Wu et al. introduce a novel approach to reactivate the Muller glia cell cycle in the mouse retina by simultaneously reducing p27Kip1 and increasing cyclin D1 using a single AAV vector. The approach effectively promotes Muller glia proliferation and reprograming without disrupting retinal structure or function. Interestingly, reactivation of the Muller glia cell cycle downregulates IFN pathway, which may contribute to the induced retinal regeneration. The results presented in this manuscript may offer a promising approach for developing Müller glia cell-mediated regenerative therapies for retinal diseases.

      Strengths:

      The data are convincing and supported by appropriate, validated methodology. These results are both technically and scientifically exciting and are likely to appeal to retinal specialists and neuroscientists in general.

      Weaknesses:

      There are some data gaps that need to be addressed.

      (1) Please label the time points of AAV injection, EdU labeling, and harvest in Figure 1B.

      We thank the reviewer for highlighting the lack of clarity in our experimental design. We have labeled all experiment timelines in the figures where appropriate in the revised version.

      (2) What fraction of Müller cells were transduced by AAV under the experimental conditions?

      We apologize for not clearly explaining the AAV transduction effeciency. AAV transduction efficiency was not uniform across the retinas. The retinal region adjacent to the optic nerve exhibits a transduction efficiency of nearly 100%. In contrast, the peripheral retina shows a lower transduction efficiency compared to the central region. The representative retinal sections with typical infection pattern are shown in Supplementary figure 4. The quantification of Edu+ MG or other markers was conducted in a 250 µm region with the highest efficiency. For scRNA-seq experiment, retinal regions with high AAV transduction efficiency were dissected with the aid of a control GFP virus.   

      (3) It seems unusually rapid for MG proliferation to begin as early as the third day after CCA injection. Can the authors provide evidence for cyclin D1 overexpression and p27 Kip1 knockdown three days after CCA injection?

      We included the data that GFP expression is evident at 3 days post AAV-GFP-GFP injection (Supplementary Fig. 1B). Additionally, we performed immunostaining and confirmed cyclin D1 overexpression at 3 days post CCA injection (Fig. 2E) as well as qPCR analysis to confirm cyclin D1 overexpression and p27kip1 knockdown at the same time point (Supplementary Fig. 5).

      (4) The authors reported that MG proliferation largely ceased two weeks after CCA treatment. While this is an interesting finding, the explanation that it might be due to the dilution of AAV episomal genome copies in the dividing cells seems far-fetched.

      We agree with the reviewer that dilution of AAV episomal genomes is unlikely to be the sole reason for the stop of MG proliferation. By staining cyclin D1 at various days post CCA injection, we found that cyclin D1 is immediately downregulated in the mitotic MG undergoing interkinetic nuclear migration to the outer nuclear layer (Fig. 2G-I). In contrast, the effect of p27<sup>kip1</sup> knockdown by CCA lasted longer (Supplementary Figure 9-10). It is possible that other anti-proliferative genes are involved in the immediate downregulation of Cyclin D1.

      Reviewer #2 (Public Review):

      This manuscript by Wu, Liao et al. reports that simultaneous knockdown of P27Kip1 with overexpression of Cyclin D can stimulate Muller glia to re-enter the cell cycle in the mouse retina. There is intense interest in reprogramming mammalian muller glia into a source for neurogenic progenitors, in the hopes that these cells could be a source for neuronal replacement in neurodegenerative diseases. Previous work in the field has shown ways in which mouse Muller glia can be neurogenically reprogrammed and these studies have shown cell cycle re-entry prior to neurogenesis. In other works, typically, the extent of glial proliferation is limited, and the authors of this study highlight the importance of stimulating large numbers of Muller glia to re-enter the cell cycle with the hopes they will differentiate into neurons. While the evidence for stimulating proliferation in this study is convincing, the evidence for neurogenesis in this study is not convincing or robust, suggesting that stimulating cell cycle-reentry may not be associated with increasing regeneration without another proneural stimulus.

      Below are concerns and suggestions.

      Intro:

      (1) The authors cite past studies showing "direct conversion" of MG into neurons. However, these studies (PMID: 34686336; 36417510) show EdU+ MG-derived neurons suggesting cell cycle re-entry does occur in these strategies of proneural TF overexpression.

      We thank the reviewer for pointing this out. We have revised the statement to "MG reprogramming".

      (2) Multiple citations are incorrectly listed, using the authors first name only (i.e. Yumi, et al; Levi, et al;). Studies are also incompletely referenced in the references.

      We apologize for the mistakes in reference. We have corrected the reference mistakes in the revised version.

      Figure 1:

      (3) When are these experiments ending? On Figure 1B it says "analysis" on the end of the paradigm without an actual day associated with this. This is the case for many later figures too. The authors should update the paradigms to accurately reflect experimental end points.

      We thank the reviewer for highlighting the lack of clarity in our experimental design. We have labeled all experiment timelines in the figures where appropriate in the revised version.

      (4) Are there better representative pictures between P27kd and CyclinD OE, the EdU+ counts say there is a 3 fold increase between Figure 1D&E, however the pictures do not reflect this. In fact, most of the Edu+ cells in Figure 1E don't seem to be Sox9+ MG but rather horizontally oriented nuclei in the OPL that are likely microglia.

      Thanks to the reviewer for pointing this out. We have replaced the image of cyclin D1 OE retina which a more representative image.

      (5) Is the infection efficacy of these viruses different between different combinations (i.e. CyclinD OE vs. P27kd vs. control vs. CCA combo)? As the counts are shown in Figure 1G only Sox9+/Edu+ cells are shown not divided by virus efficacy. If these are absolute counts blind to where the virus is and how many cells the virus hits, if the virus efficacy varies in efficiency this could drive absolute differences that aren't actually biological.

      Rule out the possibility that the differences in MG proliferation across groups are due to variations in viral efficacy, we have examined the p27<sup>kip1</sup> knockdown and cyclin D1 overexpression efficiencies for all four groups by qPCR analysis. The result showed that cyclin D1 overexpression efficiency by AAV-GFAP-Cyclin D1 virus alone or P27 knockdown efficiency by AAV-GFAP-mCherry-p27kip1 shRNA1 is comparable to, if not even higher than, those by CCA virus (Supplementary Fig 5). Therefore, the virus efficacy cannot explain the drastic increase in MG proliferation by CCA. 

      As the central retina usually had 100% infection efficacy (Supplementary Fig. 4), we quantified the Edu+Sox9+ cell number in the 250µm regions next to the optic nerve.

      (6) According to the Jax laboratories, mice aren't considered aged until they are over 18months old. While it is interesting that CCA treatment does not seem to lose efficacy over maturation I would rephrase the findings as the experiment does not test this virus in aged retinas.

      Thank you to the reviewer for bringing this to our attention. We have changed to “older adult mice” in our revised manuscript.

      (7) Supplemental Figure 2c-d. These viruses do not hit 100% of MG, however 100% of the P27Kip staining is gone in the P27sh1 treatment, even the P27+ cell in the GCL that is likely an astrocyte has no staining in the shRNA 1 picture. Why is this?

      We have replaced the images in Supplementary Fig. 2B-D.

      Figure 2

      (8) Would you expect cells to go through two rounds of cell cycle in such a short time? The treatment of giving Edu then BrdU 24 hours later would have to catch a cell going through two rounds of division in a very short amount of time. Again the end point should be added graphically to this figure.

      We thank the reviewer for the comment. We repeated the Edu/BrdU colabelling experiment with extended periods of Edu/BrdU injections. Based on the result of the MG proliferation time course study (Fig. 2A), we injected 5 times of Edu from D1 to D5 and 5 times of BrdU from D6 to D10 post-CCA injection, which covered the major phase of MG proliferation (Fig. 2B-C). Consistent with the previous findings, we did not observe any BrdU&EdU double positive MG cells.

      Additionally, we showed that cyclin D1 overexpression immediately ceased in migrating mitotic MG (Fig. 2G-I), which may explain why CCA-treated MG do not progress to the second round of cell division.

      Figure 3

      (9) I am confused by the mixing of ratios of viruses to indicate infection success. I know mixtures of viruses containing CCA or control GFP or a control LacZ was injected. Was the idea to probe for GFP or LacZ in the single cell data to see which cells were infected but not treated? This is not shown anywhere?

      The virus infection was not uniform across the entire retina (Supplementary Fig. 4). To mark the infection hotspots, we added 10% GFP virus to the mixture. Regions of the retina with low infection efficiency were removed by dissection and excluded from the scRNA-seq analysis. Therefore, we assumed that the vast majority of MG were infected by CCA. We apologize for not clearly explaining this methodological detail in the original text. We have added the experimental design to Fig. 3A and revised the result part (line 191-196) accordingly.

      (10) The majority of glia sorted from TdTomato are probably not infected with virus. Can you subset cells that were infected only for analysis? Otherwise it makes it very hard to make population judgements like Figure 3E-H if a large portion are basically WT glia.

      This question is related to the last one. Since the regions with high virus infection efficiency were selectively dissected and isolated for analysis, the CCA-infected MG should constitute the vast majority of MG in the scRNA-seq data.

      (11) Figure 3C you can see Rho is expressed everywhere which is common in studies like this because the ambient RNA is so high. This makes it very hard to talk about "Rod-like" MG as this is probably an artifact from the technique. Most all scRNA-seq studies from MG-reprogramming have shown clusters of "rods" with MG hybrid gene expression and these had in the past just been considered an artifact.

      We agree with the reviewer that the high rod gene expression in the rod-MG cluster is an artifact. We have performed multiple rounds of RNA in situ hybridization on isolated MG nuclei. The counts of Gnat1 and Rho mRNA signal are largely overlapped between the two samples with and without CCA treatment (Supplementary Fig 14). Some MG in the control retinas without CCA treatment had up to 7 or 8 dots per cell, suggesting contamination of attached rod cell debris during retina dissociation (Supplementary Fig 14). Therefore, the result did not support that rod-MG is a reprogrammed MG population with rod gene upregulation.

      (12) It is mentioned the "glial" signature is downregulated in response to CCA treatment. Where is this shown convincingly? Figure H has a feature plot of Glul, which is not clear it is changed between treatments. Otherwise MG genes are shown as a function of cluster not treatment.

      We have added box plots of several MG-specific genes to illustrate the downregulation of the glial signature in the relevant cell cluster in the revised manuscript (Supplementary Fig. 15).

      Figure 4

      (13) The authors should be commended for being very careful in their interpretations. They employ the proper controls (Er-Cre lineage tracing/EdU-pulse chasing/scRNA-seq omics) and were very careful to attempt to see MG-derived rods. This makes the conclusion from the FISH perplexing. The few puncta dots of Rho and GNAT in MG are not convincing to this reviewer, Rho and GNAT dots are dense everywhere throughout the ONL and if you drew any random circle in the ONL it would be full of dots. The rigor of these counts also comes into question because some dots are picked up in MG in the INL even in the control case. This is confusing because baseline healthy MG do not express RNA-transcripts of these Rod genes so what is this picking up? Taken together, the conclusion that there are Rod-like MG are based off scRNA-seq data (which is likely ambient contamination) and these FISH images. I don't think this data warrants the conclusion that MG upregulate Rod genes in response to CCA.

      Given the results of RNA in situ hybridization on isolated MG, we revisited the result of the RNA in situ hybridization on retinal sections as well. We performed RNA in situ in the retinal section at 1 week post CCA treatment, expecting to see lower Gnat1 and Rho signals in the ONL-localizing MG compared to 3 weeks and 4 months post CCA treatment. However, we observed similar levels across all three time points (data not shown). The lack of dynamic changes in rod gene expression levels also suggests contamination from tightly surrounding neighboring rods. Consequently, we have reinterpreted the scRNA-seq and RNA FISH data and withdrawn the conclusion that MG upregulated rod genes after CCA treatment. We thank the reviewer for pointing out this potential issue and helping us avoid an incorrect conclusion.

      Figure 5

      (14) Similar point to above but this Glul probe seems odd, why is it throughout the ONL but completely dark through the IPL, this should also be in astrocytes can you see it in the GCL? These retinas look cropped at the INL where below is completely black. The whole retinal section should be shown. Antibodies exist to GS that work in mouse along with many other MG genes, IHC or western blots could be done to better serve this point.

      We have replaced the images in Figure 4 in the revised manuscript. Additionally, we have performed the Sox9 antibody staining to demonstrate partial MG dedifferentiation following CCA treatment (Figure 5).

      Figure 6

      (15) Figure 6D is not a co-labeled OTX2+/ TdTomato+ cell, Otx2 will fill out the whole nucleus as can be seen with examples from other MG-reprogramming papers in the field (Hoang, et al. 2020; Todd, et al. 2020; Palazzo, et al. 2022). You can clearly see in the example in Figure 6D the nucleus extending way beyond Otx2 expression as it is probably overlapping in space. Other examples should be shown, however, considering less than 1% of cells were putatively Otx2+, the safer interpretation is that these cells are not differentiating into neurons. At least 99.5% are not.

      We have replaced the image of Otx2+ Tdt+ Edu+ cell, which shows the whole nucleus filled with strong Otx2 staining.  

      (16) Same as above Figure 6I is not convincingly co-labeled HuC/D is an RNA-binding protein and unfortunately is not always the clearest stain but this looks like background haze in the INL overlapping. Other amacrine markers could be tested, but again due to the very low numbers, I think no neurogenesis is occurring.

      Since we didn’t find HuC/D+Tdt+EdU+ cells at 3 weeks post CCA treatment, we believe that the weak HuC/D+ staining in the MG daughter cells at 4 months is not background, but rather reflects an incomplete neurogenic switch. This suggests that the process of neurogenesis may be ongoing but not fully realized within the observed timeframe without additional stimuli.

      (17) In the text the authors are accidently referring to Figure 6 as Figure 7.

      We thank the reviewer for pointing out the mistake. We will correct the mistake in the revised manuscript.

      Figure 7

      (18) I like this figure and the concept that you can have additional MG proliferating without destroying the retina or compromising vision. This is reminiscent of the chick MG reprogramming studies in which MG proliferate in large numbers and often do not differentiate into neurons yet still persist de-laminated for long time points.

      General:

      (19) The title should be changed, as I don't believe there is any convincing evidence of regeneration of neurons. Understanding the barriers to MG cell-cycle re-entry are important and I believe the authors did a good job in that respect, however it is an oversell to report regeneration of neurons from this data.

      We thank the reviewer for the suggestion. We have changed the title to “Simultaneous cyclin D1 overexpression and p27kip1 knockdown enable robust Müller glia cell cycle reactivation in uninjured mouse retina” in the revised manuscript.

      (20) This paper uses multiple mouse lines and it is often confusing when the text and figures switch between models. I think it would be helpful to readers if the mouse strain was added to graphical paradigms in each figure when a different mouse line is employed.

      We have labeled the mouse lines used in each experiment in the figures where appropriate.

    1. eLife Assessment

      This study provides valuable insight into the role of Meis2 in whisker hair follicle formation and confirms prior work that nerves are dispensable for this process. The solid imaging techniques support the authors' conclusions, however the data provides limited evidence to support the mechanism of Meis2 in whisker formation.

    2. Reviewer #1 (Public review):

      Summary:

      Mehmet Mahsum Kaplan et al. demonstrate that Meis2 expression in neural crest-derived mesenchymal cells is crucial for whisker follicle (WF) development, as WF fails to develop in wnt1-Cre;Meis2 cKO mice. Advanced imaging techniques effectively support the idea that Meis2 is essential for proper WF development and that nerves, while affected in Meis2 cKO, are dispensable for WF development and not the primary cause of WF developmental failure. The study also reveals that although Meis2 significantly downregulates Foxd1 in the mesenchyme, this is not the main reason for WF development failure. The paper presents valuable data on the role of mesenchymal Meis2 in WF development. However, it is still not known what is the molecular mechanisms that link Meis2 to impact the epithelial compartment.

      Strengths:

      (1) The authors describe a novel molecular mechanism involving Mesenchymal Meis2 expression, which plays a crucial role in early WF development.<br /> (2) They employ multiple advanced imaging techniques to illustrate their findings beautifully.<br /> (3) The study clearly shows that nerves are not essential for WF development.

      Weaknesses:

      The paper lacks clarity on how Meis2 loss, along with the observed general reduction in proliferation and changes in extracellular matrix and cell adhesion, leads specifically to the loss of whisker follicles. Future studies addressing this gap, perhaps with methods enabling higher cell recovery or epithelial cell inclusion in the sequenced cells, could provide valuable insights into the specific roles of Meis2 in this context.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Kaplan et al. study mesenchymal Meis2 in whisker formation and the links between whisker formation and sensory innervation. To this end, they used conditional deletion of Meis2 using the Wnt1 driver. Whisker development was arrested at the placode induction stage in Meis2 conditional knockouts leading to absence of expression of placodal genes such as Edar, Lef1, and Shh. The authors also show that branching of trigeminal nerves innervating whisker follicles was severely affected but that whiskers did form in the complete absence of trigeminal nerves.

      Strengths:

      The analysis of Meis2 conditional knockouts shows convincingly lack of whisker formation and all epithelial whisker/hair placode markers analyzed. Using Neurog1 knockout mice, the authors show that whiskers and teeth develop in the complete absence of trigeminal nerves.

      Comments on revised version:

      In the revised manuscript, Kaplan et al. have addressed some of my previous concerns, e.g., the methodological section has been updated to include the relevant information, and the Introduction now better considers the previous literature.

      In the revised manuscript, the authors have made limited efforts to address the main criticism of my original review: lack of mechanistic insight as to why mesenchymal Meis2 leads to the absence of whisker placodes. The new data reported indicate that the lack of whisker placodes is not a mere delay. In this context, the authors also show one images of E18.5 snouts that includes developing hair follicles. Interestingly, the image shown seems to indicate that hair follicles do develop normally in the absence of mesenchymal Meis2 although this finding is not reported in any detail or quantified. The authors suggest that this could be due to an early role of Meis2 in the mesenchyme because HFs develop later. Indeed, one plausible possibility is that Meis2 does not have any direct role in whisker (or hair) follicle development but is specifically required for some other function in the whisker pad mesenchyme, a function that remains unidentified in the current study as it mainly focuses on analyzing hair follicle marker expression in whisker follicles. I think this should be better reflected in the Discussion.

      Additional comments:

      The revised manuscript included the quantification of Lef1 intensity in control and Meis2 cKO whisker follicles (lines 251-252 and 255-258). Maybe I missed, but I failed to find the information how the quantification of the intensities was made, and therefore it was not possible for me to evaluate this part of the data. Nevertheless, I think the main text is not the place for these quantifications; rather, they would better fit e.g. Suppl. Figure 4.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mehmet Mahsum Kaplan et al. demonstrate that Meis2 expression in neural crest-derived mesenchymal cells is crucial for whisker follicle (WF) development, as WF fails to develop in wnt1-Cre;Meis2 cKO mice. Advanced imaging techniques effectively support the idea that Meis2 is essential for proper WF development and that nerves, while affected in Meis2 cKO, are dispensable for WF development and not the primary cause of WF developmental failure. The study also reveals that although Meis2 significantly downregulates Foxd1 in the mesenchyme, this is not the main reason for WF development failure. The paper presents valuable data on the role of mesenchymal Meis2 in WF development. However, further quantification and analysis of the WF developmental phenotype would be beneficial in strengthening the claim that Meis2 controls early WF development rather than causing a delay or arrest in development. A deeper sequencing data analysis could also help link Meis2 to its downstream targets that directly impact the epithelial compartment.

      Strengths:

      (1) The authors describe a novel molecular mechanism involving Mesenchymal Meis2 expression, which plays a crucial role in early WF development.

      (2) They employ multiple advanced imaging techniques to illustrate their findings beautifully.

      (3) The study clearly shows that nerves are not essential for WF development.

      We thank the reviewer for valuable comments that will help improve our study.

      Weaknesses:

      (1) The authors claim that Meis2 acts very early during development, as evidenced by a significant reduction in EDAR expression, one of the earliest markers of placode development. While EDAR is indeed absent from the lower panel in Figure 3C of the Meis2 cKO, multiple placodes still express EDAR in the upper two panels of the Meis2 cKO. The authors also present subsequent analysis at E13.3, showing one escaped follicle positive for SHH and Sox9 in Figures 1 and 3. Does this suggest that follicles are specified but fail to develop? Alternatively, could there be a delay in follicle formation? The increase in Foxd1 expression between E12.5 and E13.5 might also indicate delayed follicle development, or as the authors suggest, follicles that have escaped the phenotype. The paper would significantly benefit from robust quantification to accompany their visual data, specifically quantifying EDAR, Sox9, and Foxd1 at different developmental stages. Additionally, analyzing later developmental stages could help distinguish between a delay or arrest in WF development and a complete failure to specify placodes.

      The earliest DC (FOXD1) and placodal (EDAR, LEF1) markers tested in this study were observed only in the escaped WFs whereas these markers were missing in expected WF sites in mutants. This was also reflected in the loss of typical placodal morphology in the mutant’s epithelium. On the other hand, escaped WFs developed normally as shown by the analysis in Supp Fig 1A-B showing their normal size. These data suggest that development of escaped WFs is not delayed because they would appear smaller in size. To strengthen this conclusion, we assessed whisker development at E18.5 in Meis2 cKO mice by EDAR staining and results are shown in newly added Supplementary Figure 2. This experiment revealed that whisker phenotype persisted until E18.5 therefore this phenotype cannot be explained by a developmental delay.

      As far as quantification is concerned, we have already quantified the number of whiskers in controls and mutants at E12.5 and E13.5 in all whole mount experiments we did, i.e. Shh ISH and SOX9 or EDAR whole mount IFC. We pooled all these numbers together and calculated the whisker number reduction to 5.7+/-2.0% at E12.5 and 17.1+/-5.9 at E13.5. Line:132-134.

      (2) The authors show that single-cell sequencing reveals a reduction in the pre-DC population, reduced proliferation, and changes in cell adhesion and ECM. However, these changes appear to affect most mesenchymal cells, not just pre-DCs. Moreover, since E12.5 already contains WFs at different stages of development, as well as pre-DCs and DCs, it becomes challenging to connect these mesenchymal changes directly to WF development. Did the authors attempt to re-cluster only Cluster 2 to determine if a specific subpopulation is missing in Meis2 cKO? Alternatively, focusing on additional secreted molecules whose expression is disrupted across different clusters in Meis2 cKO could provide insights, especially since mesenchymal-epithelial communication is often mediated through secreted molecules. Did the authors include epithelial cells in the single-cell sequencing, can they look for changes in mesenchyme-epithelial cell interactions (Cell Chat) to indicate a possible mechanism?

      We agree with the reviewer that the effect of Meis2 on cell proliferation and expression of cell adhesion and ECM markers are more general because they take place in the whole underlying mesenchyme. Our genetic tools did not allow specific targeting of DC or pre-DCs. Nonetheless, we trust that our data show that mesenchymal Meis2 is required for the initial steps of WF development including Pc formation. As far as bioinformatics data are concerned, this data set was taken from the large dataset GSE262468 covering the whole craniofacial region which led to very limited cell numbers in the cluster 2 (DC): WT_E12_5 --> 28, WT_E13_5 --> 131, MUT_E12_5 --> 19, MUT_E13_5 --> 28. Unfortunately, such small cell numbers did not allow further sub-clustering, efficient normalization, integration and conclusions from their transcriptional profiles. Although a number of interesting differentially expressed genes were identified (see supplementary datasets), none of them convincingly pointed at reasonable secreted molecule candidate. 

      We agree with the reviewer that cellchat analysis could provide robust indication of the mesenchymal-epithelial communication, however our datasets included only mesenchymal cell population (Wnt1-Cre2progeny) and epithelial cells were excluded by FACS prior to sc RNA-seq. (Hudacova et al. https://doi.org/10.1016/j.bone.2024.117297)

      (3) The authors aim to link Meis2 expression in the mesenchyme with epithelial Wnt signaling by analyzing Lef1, bat-gal, Axin1, and Wnt10b expression. However, the changes described in the figures are unclear, and the phenotype appears highly variable, making it difficult to establish a connection between Meis2 and Wnt signaling. For instance, some follicles and pre-condensates are Lef1 positive in Meis2 cKO. Including quantification or providing a clearer explanation could help clarify the relationship between mesenchymal Meis2 and Wnt signaling in both epidermal and mesenchymal cells. Did the authors include epithelial cells in the sequencing? Could they use single-cell analysis to demonstrate changes in Wnt signaling?

      We have now analyzed changes in LEF1 staining intensity in the epithelium and in the upper dermis. According to these quantifications, we observed a considerable decline in the number of LEF1+ placodes in the epithelium which corresponds to the lower number of placodes. On the other hand, LEF1 intensity in the ‘escaped’ placodes were similar between controls and mutants. LEF1 signal in the upper dermis is very strong overall and its quantification did not reveal any changes in the DC and non-DC region of the upper dermis. These data corroborate with our conclusion that Meis2 in the mesenchyme is not crucial for the dermal WNT signaling but is required for induction of LEF1 expression in the epithelium. However, once ‘escaper’ placodes appear, they display normal wnt signaling in Pc, DC and subsequent development. These quantitative data have been added to the revised manuscript. Line247-260.

      (4) Existing literature, including studies on Neurog KO and NGF KO, as well as the references cited by the authors, suggest that nerves are unlikely to mediate WF development. While the authors conduct a thorough analysis of WF development in Neurog KO, further supporting this notion, this point may not be central to the current work. Additionally, the claim that Meis2 influences trigeminal nerve patterning requires further analysis and quantification for validation.

      We agree with the reviewer that analysis of the Neurogenin1 knockout mice should not be central to this report. Nonetheless, a thorough analysis of WF development in Neurog1 KO was needed to distinguish between two possible mechanisms: whisker phenotype in Meis2 cKO results from 1. impaired nerve branching 2. Function of Meis2 in the mesenchyme. We will modify the text accordingly to make this clearer to readers. We also agree that nerve branching was not extensively analyzed in the current study but two samples from mutant mice were provided (Fig1 and Supp Videos), reflecting the consistency of the phenotype (see also Machon et al. 2015). This section was not central to this report either but led us to focus fully on the mesenchyme. We think that Meis2 function in cranial nerve development is very interesting and deserves a separate study.

      We have edited the introduction to reflect the literature better. Line70-79.

      (5) Meis2 expression seems reduced but has not entirely disappeared from the mesenchyme. Can the authors provide quantification?

      We have attempted to quantify MEIS2 staining in the snout dermis. However, the background fluorescence made it challenging to reliable quantify. Additionally, since at the point, dermal region where MEIS2 expression is relevant to induce WF formation is not known, we were unable to determine the regions to analyze. Instead, we now added three additional images from multiple regions of the snout sections stained with MEIS2 antibody in Supplementary Figure 1C. We believe newly added images will make our conclusion that MEIS2 is efficiently deleted in the mutants more convincing.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Kaplan et al. study mesenchymal Meis2 in whisker formation and the links between whisker formation and sensory innervation. To this end, they used conditional deletion of Meis2 using the Wnt1 driver. Whisker development was arrested at the placode induction stage in Meis2 conditional knockouts leading to the absence of expression of placodal genes such as Edar, Lef1, and Shh. The authors also show that branching of trigeminal nerves innervating whisker follicles was severely affected but that whiskers did form in the complete absence of trigeminal nerves.

      Strengths:

      The analysis of Meis2 conditional knockouts convincingly shows a lack of whisker formation and all epithelial whisker/hair placode markers were analyzed. Using Neurog1 knockout mice, the authors show equally convincingly that whiskers and teeth develop in the complete absence of trigeminal nerves.

      We thank the reviewer for valuable comments that will help improve our study.

      Weaknesses:

      The manuscript does not provide much mechanistic insight as to why mesenchymal Meis2 leads to the absence of whisker placodes. Using a previously generated scRNA-seq dataset they show that two early markers of dermal condensates, Foxd1 and Sox2, are downregulated in Meis2 mutants. However, given that placodes and dermal condensates do not form in the mutants, this is not surprising and their absence in the mutants does not provide any direct link between Meis2 and Foxd1 or Sox2. (The absence of a structure evidently leads to the absence of its markers.)

      We apologize for unclear explanation of our data. We meant that Meis2 is functionally upstream of Foxd1 because Foxd1 is reduced upon Meis2 deletion. This means that during WF formation, Meis2 operates before Foxd1 induction and does not mean necessarily that Meis2 directly controls expression of Foxd1. Yes, we agree with reviewer’s note that Foxd1 and Sox2, as known DC markers, decline because the number of WF declines. We wanted to convince readers that Meis2 operates very early in the GRN hierarchy during WF development. We also admit that we provide poor mechanistic insights into Meis2 function as a transcription factor. We think that this weak point does not lower the value of the report showing indispensable role of Meis2 in WFs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The text could benefit from editing.

      We have proofread the text.

      Some information is missing from the materials and methods section - a description of sequenced cells, the ISH protocol used, etc.

      Methodological section has been updated and single-cell experiments were performed and described in detail by Hudacova et al. 2025  (https://doi.org/10.1016/j.bone.2024.117297). We have utilized these datasets for scRNA analysis which has been described sufficiently in the referred paper. Reference for standard in site protocol has been added.

      Reviewer #2 (Recommendations for the authors):

      In the Introduction of the paper, the authors raise the question on the role of innervation in whisker follicle induction "It has been speculated that early innervation plays a role in initiating WF formation (ref. 1)"...and..."this revives the previous speculations that axonal network may be involved in WF positioning". However, the authors forget to mention that Wrenn & Wessless, 1984 (reference 1 in the manuscript) made exactly the opposite conclusion and stated e.g. "Nerve trunks and branches are present in the maxillary process well before any sign of vibrissa formation. Because innervation is so widespread there appears to be no immediate temporal correlation between the outgrowth of a nerve branch to a site and the generation of a vibrissa there. Furthermore, at the time just prior to the formation of the first follicle rudiment, there is little or no nerve branching to the presumptive site of that first follicle while branches are found more dorsally where vibrissae will not form until later." Therefore, I find that referring to the paper by Wrenn & Wessells is somewhat misleading. Given that the whisker follicles develop in ex vivo cultured whisker pads further hints that innervation is unlikely to play a role in whisker follicle induction.

      The Introduction also hints at the role of innervation in tooth induction but forgets to refer to the literature that shows exactly the opposite. Based on the evidence it rather appears that the developing tooth regulates the establishment of its own nerve supply, not that the nerves would regulate induction of tooth development.

      in my opinion, the Introduction should be partially rewritten to better reflect the literature.

      The introduction has been revised to better reflect the literature on the role of innervation on WF and tooth development. Line70-87.

      The authors conclude that Meis2 is upstream of Foxd1, but the evidence is based on the lack of Foxd1 expression in Meis2 mutants. However, as whiskers do not form, evidently all markers are also absent. More direct evidence of Meis2 being upstream of Foxd1 (or Sox2) should be presented to consolidate the conclusions.

      We have already reacted to this point above in the section Weaknesses. The text is now modified so that the interpretation is correct. Line: 407-409.

      Other comments:

      Author contributions state that XX performed experiments but the author list does not include anyone with such initials.

      This error has been corrected in revision.

    1. eLife Assessment

      This valuable study presents a computational model that simulates walking motions in Drosophila and suggests that, if sensorimotor delays in the neural circuitry were any longer, the system would be easily destabilized by external perturbations. The hierarchical control model is sensible and the evidence supporting the conclusions convincing. The modular model, which has many interacting components with varying degrees of biological realism, will serve as a well-grounded starting point for future studies that incorporate richer or more complete empirical data.

    2. Reviewer #1 (Public Review):

      Summary:

      In this work, the authors present a novel, multi-layer computational model of motor control to produce realistic walking behaviour of a Drosophila model in the presence of external perturbations and under sensory and motor delays. The novelty of their model of motor control is that it is modular, with divisions inspired by the fly nervous system, with one component based on deep learning while the rest are based on control theory. They show that their model can produce realistic walking trajectories. Given the mostly reasonable assumptions of their model, they convincingly show that the sensory and motor delays present in the fly nervous system are the maximum allowable for robustness to unexpected perturbations.

      Their fly model outputs torque at each joint in the leg, and their dynamics model translates these into movements, resulting in time-series trajectories of joint angles. Inspired by the anatomy of the fly nervous system, their fly model is a modular architecture that separates motor control at three levels of abstraction:<br /> (1) oscillator-based model of coupling of phase angles between legs,<br /> (2) generation of future joint-angle trajectories based on the current state and inputs for each leg (the trajectory generator), and<br /> (3) closed-loop control of the joint-angles using torques applied at every joint in the model (control and dynamics).

      These three levels of abstraction ensure coordination between the legs, future predictions of desired joint angles, and corrections to deviations from desired joint-angle trajectories. The parameters of the model are tuned in the absence of external perturbations using experimental data of joint angles of a tethered fly. A notable disconnect from reality is that the dynamics model used does not model the movement of the body and ground contacts as is the case in natural walking, nor the movement of a ball for a tethered fly, but instead something like legs moving in the air for a tethered fly.

      In order to validate the realism of the generated simulated walking trajectories, the authors compare various attributes of simulated to real tethered fly trajectories and show qualitative and quantitative similarities, including using a novel metric coined as Kinematic Similarity (KS). The KS score of a trajectory is a measure of the likelihood that the trajectory belongs to the distribution of real trajectories estimated from the experimental data. While such a metric is a useful tool to validate the quality of simulated data, there is some room for improvement in the actual computation of this score. For instance, the KS score is computed for any given time-window of walking simulation using a fraction of information from the joint-angle trajectories. It is unclear if the remaining information in joint-angle trajectories that are not used in the computation of the KS score can be ignored in the context of validating the realism of simulated walking trajectories.

      The authors validate simulated walking trajectories generated by the trained model under a range of sensorimotor delays and external perturbations. The trained model is shown to generate realistic joint-angle trajectories in the presence of external perturbations as long as the sensorimotor delays are constrained within a certain range. This range of sensorimotor delays is shown to be comparable to experimental measurements of sensorimotor delays, leading to the conclusion that the fly nervous system is just fast enough to be robust to perturbations.

      Strengths:

      This work presents a novel framework to simulate Drosophila walking in the presence of external perturbations and sensorimotor delay. Although the model makes some simplifying assumptions, it has sufficient complexity to generate new, testable hypotheses regarding motor control in Drosophila. The authors provide evidence for realistic simulated walking trajectories by comparing simulated trajectories generated by their trained model with experimental data using a novel metric proposed by the authors. The model proposes a crucial role in future predictions to ensure robust walking trajectories against external perturbations and motor delay. Realistic simulations under a range of prediction intervals, perturbations, and motor delays generating realistic walking trajectories support this claim. The modular architecture of the framework provides opportunities to make testable predictions regarding motor control in Drosophila. The work can be of interest to the Drosophila community interested in digitally simulating realistic models of Drosophila locomotion behaviors, as well as to experimentalists in generating testable hypotheses for novel discoveries regarding neural control of locomotion in Drosophila. Moreover, the work can be of broad interest to neuroethologists, serving as a benchmark in modelling animal locomotion in general.

      Weaknesses:

      As the authors acknowledge in their work, the control and dynamics model makes some simplifying assumptions about Drosophila physics/physiology in the context of walking. For instance, the model does not incorporate ground contact forces and inertial effects of the fly's body. It is not clear how these simplifying assumptions would affect some of the quantitative results derived by the authors. The range of tolerable values of sensorimotor delays that generate realistic walking trajectories is shown to be comparable with sensorimotor delays inferred from physiological measurements. It is unclear if this comparison is meaningful in the context of the model's simplifying assumptions. The authors propose a novel metric coined as Kinematic Similarity (KS) to distinguish realistic walking trajectories from unrealistic walking trajectories. Defining such an objective metric to evaluate the model's predictions is a useful exercise, and could potentially be applied to benchmark other computational animal models that are proposed in the future. However, the KS score proposed in this work is calculated using only the first two PCA modes that cumulatively account for less than 50% of the variance in the joint angles. It is not obvious that the information in the remaining PCA modes may not change the log-likelihood that occurs in the real walking data.

      Comments on revisions:

      The authors have addressed the concerns and questions raised in the original review.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, Karashchuk et al. develop a hierarchical control system to control the legs of a dynamic model of the fly. They intend to demonstrate that temporal delays in sensorimotor processing can destabilize walking and that the fly's nervous system may be operating with as long of delays as could possibly be corrected for.

      Strengths:

      Overall, the approach the authors take is impressive. Their model is trained using a huge dataset of animal data, which is a strength. Their model was not trained to reproduce animal responses to perturbations, but it successfully rejects small perturbations and continues to operate stably. Their results are consistent with the literature, that sensorimotor delays destabilize movements.

      Weaknesses:

      The model is sophisticated and interesting, but the reviewer has great concerns regarding this manuscript's contributions, as laid out in the abstract:

      (1) Much simpler models can be used to show that delays in sensorimotor systems destabilize behavior (e.g., Bingham, Choi, and Ting 2011; Ashtiani, Sarvestani, and Badri-Sproewitz 2021), so why create this extremely complex system to test this idea? The complexity of the system obscures the results and leaves the reviewer wondering if the instability is due to the many, many moving parts within the model. The reviewer understands (and appreciates) that the authors tested the impact of the delay in a controlled way, which supports their conclusion. However, the reviewer thinks the authors did not use the most parsimonious model possible, and as such, leave many possible sources for other causes of instability.

      (2) In a related way, the reviewer is not sure that the elements the authors introduced reflect the structure or function of the fly's nervous system. For example, optimal control is an active field of research and is behind the success of many-legged robots, but the reviewer is not sure what evidence exists that suggests the fly ventral nerve cord functions as an optimal controller. If this were bolstered with additional references, the reviewer would be less concerned.

      (3) "The model generates realistic simulated walking that matches real fly walking kinematics...". The reviewer appreciates the difficulty in conducting this type of work, but the reviewer cannot conclude that the kinematics "match real fly walking kinematics". The range of motion of several joints is 30% too small compared to the animal (Figure 2B) and the reviewer finds the video comparisons unpersuasive. The reviewer would understand if there were additional constraints, e.g., the authors had designed a robot that physically could not complete the prescribed motions. However the reviewer cannot think of a reason why this simulation could not replicate the animal kinematics with arbitrary precision, if that is the goal.

      Comments on revisions:

      The authors have addressed the concerns and questions raised in the original review.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their supportive comments about our modeling approach and conclusions, and for raising several valid concerns; we address them briefly below. In addition, a detailed, point-by-point response to the reviewers’ comments are below, along with additions and edits we have made to the revised manuscript. 

      Concerns about model’s biological realism and impact on interpretations

      The goal of this paper was to use an interpretable and modular model to investigate the impact of varying sensorimotor delays. Aspects of the model (e.g. layered architecture, modularity) are inspired by biology; at the same time, necessary abstractions and simplifications (e.g. using an optimal controller) are made for interpretability and generalizability, and they reflect common approaches from past work. The hypothesized effects of certain simplifying assumptions are discussed in detail in Section 3.5. Furthermore, the modularity of our model allows us to readily incorporate additional biological realism (e.g. biomechanics, connectomics, and neural dynamics) in future work. In the revision, we have added citations and edits to the text to clarify these points.

      Concerns that the model is overly complex

      To investigate the impact of sensorimotor delays on locomotion, we built a closed-loop model that recapitulates the complex joint trajectories of fly walking. We agree that locomotion models face a tradeoff between simplicity/interpretability and realism — therefore, we developed a model that was as simple and interpretable as possible, while still reasonably recapitulating joint trajectories and generalizing to novel simulation scenarios. Along these lines, we also did not select a model that primarily recreates empirical data, as this would hinder generalizability and add unnecessary complexity to the model. We do not think these design choices are significant weaknesses of this model; in fact, few comparable models account for all joints involved in locomotion, and fewer explicitly compare model kinematics with kinematics from data. We have add citations and edits to the text to clarify these points in the revision. 

      Concerns about the validity of the Kinematic Similarity (KS) metric to evaluate walking

      We chose to incorporate only the first two PCA modes dimensions in the KS metric because the kernel density estimator performs poorly for high dimensional data. Our primary use of this metric was to indicate whether the simulated fly continues walking in the presence of perturbations. For technical reasons, it is not feasible to perform equivalent experiments on real walking flies, which is one of the reasons we explore this phenomenon with the model. We note the dramatic shift from walking to nonwalking as delay increases (Figure 5). To be thorough, in the revision, we have investigated the effect of incorporating additional PCA modes, and whether this affects the interpretation of our results. We have additionally added to the discussion and presentation of the KS metric to clarify its purpose in this study. We agree with the reviewers that the KS metric is too coarse to reflect fine details of joint kinematics; indeed, in the unperturbed case, we evaluate our model’s performance using other metrics based on comparisons with empirical data (Figures 2, 7, 8). 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors present a novel, multi-layer computational model of motor control to produce realistic walking behaviour of a Drosophila model in the presence of external perturbations and under sensory and motor delays. The novelty of their model of motor control is that it is modular, with divisions inspired by the fly nervous system, with one component based on deep learning while the rest are based on control theory. They show that their model can produce realistic walking trajectories. Given the mostly reasonable assumptions of their model, they convincingly show that the sensory and motor delays present in the fly nervous system are the maximum allowable for robustness to unexpected perturbations.

      Their fly model outputs torque at each joint in the leg, and their dynamics model translates these into movements, resulting in time-series trajectories of joint angles. Inspired by the anatomy of the fly nervous system, their fly model is a modular architecture that separates motor control at three levels of abstraction:

      (1) oscillator-based model of coupling of phase angles between legs,

      (2) generation of future joint-angle trajectories based on the current state and inputs for each leg (the trajectory generator), and

      (3) closed-loop control of the joint-angles using torques applied at every joint in the model (control and dynamics).

      These three levels of abstraction ensure coordination between the legs, future predictions of desired joint angles, and corrections to deviations from desired joint-angle trajectories. The parameters of the model are tuned in the absence of external perturbations using experimental data of joint angles of a tethered fly. A notable disconnect from reality is that the dynamics model used does not model the movement of the body and ground contacts as is the case in natural walking, nor the movement of a ball for a tethered fly, but instead something like legs moving in the air for a tethered fly.

      n order to validate the realism of the generated simulated walking trajectories, the authors compare various attributes of simulated to real tethered fly trajectories and show qualitative and quantitative similarities, including using a novel metric coined as Kinematic Similarity (KS). The KS score of a trajectory is a measure of the likelihood that the trajectory belongs to the distribution of real trajectories estimated from the experimental data. While such a metric is a useful tool to validate the quality of simulated data, there is some room for improvement in the actual computation of this score. For instance, the KS score is computed for any given time-window of walking simulation using a fraction of information from the joint-angle trajectories. It is unclear if the remaining information in joint-angle trajectories that are not used in the computation of the KS score can be ignored in the context of validating the realism of simulated walking trajectories.

      The authors validate simulated walking trajectories generated by the trained model under a range of sensorimotor delays and external perturbations. The trained model is shown to generate realistic jointangle trajectories in the presence of external perturbations as long as the sensorimotor delays are constrained within a certain range. This range of sensorimotor delays is shown to be comparable to experimental measurements of sensorimotor delays, leading to the conclusion that the fly nervous system is just fast enough to be robust to perturbations.

      Strengths:

      This work presents a novel framework to simulate Drosophila walking in the presence of external perturbations and sensorimotor delay. Although the model makes some simplifying assumptions, it has sufficient complexity to generate new, testable hypotheses regarding motor control in Drosophila. The authors provide evidence for realistic simulated walking trajectories by comparing simulated trajectories generated by their trained model with experimental data using a novel metric proposed by the authors. The model proposes a crucial role in future predictions to ensure robust walking trajectories against external perturbations and motor delay. Realistic simulations under a range of prediction intervals, perturbations, and motor delays generating realistic walking trajectories support this claim. The modular architecture of the framework provides opportunities to make testable predictions regarding motor control in Drosophila. The work can be of interest to the Drosophila community interested in digitally simulating realistic models of Drosophila locomotion behaviors, as well as to experimentalists in generating testable hypotheses for novel discoveries regarding neural control of locomotion in Drosophila. Moreover, the work can be of broad interest to neuroethologists, serving as a benchmark in modelling animal locomotion in general.

      We thank the reviewer for their positive comments.

      Weaknesses:

      As the authors acknowledge in their work, the control and dynamics model makes some simplifying assumptions about Drosophila physics/physiology in the context of walking. For instance, the model does not incorporate ground contact forces and inertial effects of the fly's body. It is not clear how these simplifying assumptions would affect some of the quantitative results derived by the authors. The range of tolerable values of sensorimotor delays that generate realistic walking trajectories is shown to be comparable with sensorimotor delays inferred from physiological measurements. It is unclear if this comparison is meaningful in the context of the model's simplifying assumptions.

      We now discuss how some of these assumptions affect the quantitative results in the section “Towards biomechanical and neural realism”. We reproduce the relevant sentences below:

      “The inclusion of explicit leg-ground contact interactions would also make it harder for the model to recover when perturbed, because perturbations during walking often occur upon contact with the ground (e.g. the ground is slippery or bumpy).”

      “We anticipate that the increased sensory resolution from more detailed proprioceptor models and the stability from mechanical compliance of limbs in a more detailed biomechanical model would make the system easier to control and increase the allowable range of delay parameters. Conversely, we expect that modeling the nonlinearity and noise inherent to biological sensors and actuators may decrease the allowable range of delay parameters.”

      The authors propose a novel metric coined as Kinematic Similarity (KS) to distinguish realistic walking trajectories from unrealistic walking trajectories. Defining such an objective metric to evaluate the model's predictions is a useful exercise, and could potentially be applied to benchmark other computational animal models that are proposed in the future. However, the KS score proposed in this work is calculated using only the first two PCA modes that cumulatively account for less than 50% of the variance in the joint angles. It is not obvious that the information in the remaining PCA modes may not change the log-likelihood that occurs in the real walking data.

      The primary reason we designed the KS metric was to determine whether the simulated fly continues walking in the presence of perturbations. We initially limited the analysis of the KS to the first 2 principal components. For completeness, we now investigate the additional principal components in Appendix 9 and the effect of evaluating KS with different numbers of components in Appendix 10. 

      Overall, the results look similar when including additional components for impulse perturbations. For stochastic perturbations, the range of similar walking decreases as we increase the number of components used to evaluate walking kinematics. Comparing this with Appendix 9, which shows that higher components represent higher frequencies of the walking cycle, we conclude that at the edge of stability for delays (where sum of sensory and actuation delays are about 40ms), flies can continue walking but with impaired higher frequencies (relative to no perturbations) during and after perturbation. 

      We added the following text in the methods:

      “We chose 2 dimensions for PCA for two key reasons. First, these 2 dimensions alone accounted for a large portion of the variance in the data (52.7% total, with 42.1% for first component and 10.6% for second component). There was a big drop in variance explained from the first to the second component, but no sudden drop in the next 10 components (see Appendix 9). Second, the KDE procedure only works effectively in low-dimensional spaces, and the minimal number of dimensions needed to obtain circular dynamics for walking is 2. We investigate the effect of varying the number of dimensions of PCA in Appendix 10.”

      (Note that we have corrected the percentage of variance accounted for by the principal components, as these numbers were from an older analysis prior to the first draft.)

      We also reference Appendix 10 in the results:

      “We observed that robust walking was not contingent on the specific values of motor and sensory delay, but rather the sum of these two values (Fig. 5E). Furthermore, as delay increases, higher frequencies of walking are impacted first before walking collapses entirely (Appendix 10).”

      Reviewer #2 (Public Review):

      Summary:

      In this study, Karashchuk et al. develop a hierarchical control system to control the legs of a dynamic model of the fly. They intend to demonstrate that temporal delays in sensorimotor processing can destabilize walking and that the fly's nervous system may be operating with as long of delays as could possibly be corrected for.

      Strengths:

      Overall, the approach the authors take is impressive. Their model is trained using a huge dataset of animal data, which is a strength. Their model was not trained to reproduce animal responses to perturbations, but it successfully rejects small perturbations and continues to operate stably. Their results are consistent with the literature, that sensorimotor delays destabilize movements.

      Weaknesses:

      The model is sophisticated and interesting, but the reviewer has great concerns regarding this manuscript's contributions, as laid out in the abstract:

      (1) Much simpler models can be used to show that delays in sensorimotor systems destabilize behavior (e.g., Bingham, Choi, and Ting 2011; Ashtiani, Sarvestani, and Badri-Sproewitz 2021), so why create this extremely complex system to test this idea? The complexity of the system obscures the results and leaves the reviewer wondering if the instability is due to the many, many moving parts within the model. The reviewer understands (and appreciates) that the authors tested the impact of the delay in a controlled way, which supports their conclusion. However, the reviewer thinks the authors did not use the most parsimonious model possible, and as such, leave many possible sources for other causes of instability.

      We thank the reviewer for this observation — we agree that we did not make the goal of the work quite clear. The goal of this paper was to build an interpretable and generalizable model of fly walking, which was then used to investigate varying sensorimotor delays in the context of locomotion. To this end, we used a modular model to recreate walking kinematics, and then investigated the effect of delays on locomotion. Locomotion in itself is a complex phenomenon — thus, we have chosen a model that is complex enough to reasonably recapitulate joint trajectories, while remaining interpretable.

      We have clarified this in the text near the end of the introduction:

      “Here, we develop a new, interpretable, and generalizable model of fly walking, which we use to investigate the impact of varying sensorimotor delays in Drosophila locomotion.”

      We also emphasize the investigation of sensorimotor delays in the context of locomotion in the beginning of the “Effect of sensory and motor delays on walking” section:

      “... we used our model to investigate how changing sensory and motor delays affects locomotor robustness.”

      We also remark that while they are very relevant papers for our work, neither of the prior papers focus on locomotion: the first involves a 2D balance model of a biped, and the second involves drop landings of quadrupeds.

      Lastly, we note that the investigation of delay is not the only use for this model —  in the future, this model can also be used to study other aspects of locomotion such as the role of proprioceptive feedback (see “Role of proprioceptive feedback in fly walking” section). The layered framework of the model can also be extended to other animals and locomotor strategies (see “Layered model produces robust walking and facilitates local control” section”).

      (2) In a related way, the reviewer is not sure that the elements the authors introduced reflect the structure or function of the fly's nervous system. For example, optimal control is an active field of research and is behind the success of many-legged robots, but the reviewer is not sure what evidence exists that suggests the fly ventral nerve cord functions as an optimal controller. If this were bolstered with additional references, the reviewer would be less concerned.

      We thank the reviewer for the comment — we have now further clarified how our model elements reflect the fly’s nervous system. The elements we introduce are plausible but only loosely analogous to the fly’s nervous system. While we draw parallels from these elements to anatomy (e.g. in Fig 1A-B, and in the first paragraph of the Results section), we do not mean to suggest that these functional elements directly correspond to specific structures in the fly’s nervous system. A substantial portion of the suggested future work (see “Towards biomechanical and neural realism”) aims to bridge the gap between these functional elements and fly physiology, which is beyond the scope of this work. 

      We have added clarifying text to the Results section:

      “While the model is inspired by neuroanatomy, its components do not strictly correspond to components of the nervous system --- the construction of a neuroanatomically accurate model is deferred to future work (see Discussion).”

      In the specific case of optimal control — optimal control is a theoretical model that predicts various aspects of motor control in humans, there is evidence that optimal control is implemented by the human nervous system (Todorov and Jordan, 2002; Scott, 2004; Berret et al., 2011). Based on this, we make the assumption that optimal control is a reasonable model for motor control in flies implemented by the fly nervous system as well. Fly movement makes use of proprioceptive feedback signals (Mendes et al., 2013; Pratt et al., 2024; Berendes et al., 2016), and optimal control is a plausible mechanism that incorporates feedback signals into movement.

      We have added the following clarifying text in the Results section: 

      “The optimal controller layer maintains walking kinematics in the presence of sensori motor delays and helps compensate for external perturbations. This design was inspired by optimal control-based models of movements in humans (Todorov and Jordan, 2002; Scott, 2004; Berret et al., 2011)”

      (3) "The model generates realistic simulated walking that matches real fly walking kinematics...". The reviewer appreciates the difficulty in conducting this type of work, but the reviewer cannot conclude that the kinematics "match real fly walking kinematics". The range of motion of several joints is 30% too small compared to the animal (Figure 2B) and the reviewer finds the video comparisons unpersuasive. The reviewer would understand if there were additional constraints, e.g., the authors had designed a robot that physically could not complete the prescribed motions. However the reviewer cannot think of a reason why this simulation could not replicate the animal kinematics with arbitrary precision, if that is the goal.

      We agree with the reviewer that the model-generated kinematics are not perfectly indistinguishable from real walking kinematics, and now clarify this in the text. We also agree with the reviewer that one could build a model that precisely replicates real kinematics, but as they intuit, that was not our goal. Our goal was to build a model that both replicates animal kinematics, and is interpretable and generalizable (which allows us to investigate what happens when perturbations and varying sensorimotor delays are introduced). There is a trade-off between realism and generalizability — a simulation that fully recreates empirical data would require a model that is completely fit to data, which is likely to be more complex (in terms of parameters required) and less generalizable to novel scenarios. We have made design choices that result in a model that balances these trade-offs. We do not consider this to be a weakness of the model; in fact, few comparable models account for all joints involved in locomotion, and fewer explicitly compare model kinematics with kinematics from data.

      We have tempered the language in the abstract:

      “The model generates realistic simulated walking that resembles real fly walking kinematics”

      The tempered statement, we believe, is a fair characterization of the walking — it resembles but does not perfectly match real kinematics.

      We have also introduced clarifying text in the introduction:

      “Overall, existing walking models focus on either kinematic or physiological accuracy, but few achieve both, and none consider the effect of varying sensorimotor delays. Here, we develop a new, interpretable, and generalizable model of fly walking, which we use to investigate the impact of varying sensorimotor delays in Drosophila locomotion.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Potential typo on page 5:

      2.1.2 Joint kinematics trajectory generator

      Paragraph 4, last line: Original text - ".....it also estimates the current phase". Suggested correction - "...it also estimates the current phase velocity"

      Done

      Potential typo on page 8:

      2.3 Model maintains walking under unpredictable external perturbations.

      Paragraph 3, line 2: Original text - "...brief, unexpected force (e.g. legs slipping on an unstable surface)".

      Consider replacing force with motion, or providing an example of a force as opposed to displacement (slipping).

      Done

      Potential typo on page 8:

      2.3 Model maintains walking under unpredictable external perturbations.

      Paragraph 3, line 4: Original text - "The magnitude of this velocity is drawn from a normal distribution...".

      Is this really magnitude? If so, please discuss how the sign (+/-) is assigned to velocity, and how the normal distribution is centred so as to sample only positive values representing magnitude.

      Indeed the magnitude of the velocity is drawn from a normal distribution. A positive or negative sign is then assigned with equal odds. We have added text to clarify this:

      “The sign of the velocity was drawn separately so that there is equal likelihood for negative or positive perturbation velocities.”

      Page 8:

      2.3 Model maintains walking under unpredictable external perturbations.

      In Paragraph 5: Why is the data reduced to only 2 dimensions? Could higher order PCA modes (cumulatively accounting for more than 50% variance in the data) not have distinguishing information between realistic and unrealistic walking trajectories?

      We provide a longer response for this in the public review above.

      Page 11:

      Why wouldn't a system trained in the presence of external perturbations perform better? What is the motivation to remove external perturbations during training?

      We agree that a system trained in the presence of external perturbations would probably perform better — however, we do not have data that contains walking with external perturbations. Nothing was removed — all the data used in this study involve a fly walking without perturbations.

      We have added a clarification:

      “our model maintains realistic walking in the presence of external dynamic perturbations, despite being trained only on data of walking without perturbations (no perturbation data was available).”

      Page 16:

      4.1 Tracking joint angles of D. melanogaster walking in 3D.

      Paragraph 1: Readers who wish to collect similar data might benefit from specifying the exposure time, animal size in pixels (or camera sensor format and field of view), in addition to the frame rate. Alternatively, consider mentioning the camera and lens part numbers provided by the manufacturer.

      This is a good point. We have updated the text to include these specifications:

      “We obtained fruit fly D. melanogaster walking kinematics data following the procedure previously described in (Karashchuk et al, 2021). Briefly, a fly was tethered to a tungsten wire and positioned on a frictionless spherical treadmill ball suspended on compressed air. Six cameras (Basler acA800-510um with Computar zoom lens MLM3X-MP) captured the movement of all of the fly's legs at 300 Hz. The fly size in pixels ranges from about 300x300 up to 700x500 pixels across the 6 cameras. Using Anipose, we tracked 30 keypoints on the fly, which are the following 5 points on each of the 6 legs: body-coxa, coxa-femur, femur-tibia, and tibia-tarsus joints, as well as the tip of the tarsus.”

      Potential typos on page 18:

      4.3.3 Training procedure

      Paragraph 2, line 1: Original text - "..(, p)"

      Do the authors mean "...(, )"

      Paragraph 2, line 2: Original text - "... (,, v, p)" Do the authors mean "... (,, v, )"?

      Paragraph 3, line 3: Original text - "... (,, v, p)" Do the authors mean "... (,, v, )"?

      Thank you for pointing out this issue. We have now fixed the phase p to be \phi to be consistent with the rest of the text.

      Paragraph 3, line 3: Original text - "...()"

      Do the authors mean "(d)"? If not, please discuss the difference between and d.

      Thank you for pointing this out. \hat \theta and \theta_d were used interchangeably which is confusing. We have standardized our reference to the desired trajectory as \theta_d throughout the text.

      Page 19:

      Typo after eqn. (6):

      Original text: "where x := q - q, ... A and B are Jacobians with respect to...."

      Correction: "where x := q - q, ... Ac and Bc are Jacobians with respect to...."

      Similar corrections in eqn. 7 and eqn. 8: A and B should be replaced with Ac and Bc. Done

      Page 19, eqn. (10b):

      Should the last term be qd(t+T) as opposed to qd(t+1)?

      No: in fact (10a) contains the typo: it should be y(t+1) as opposed to y(t+T). This has been fixed.

      Page 19

      The authors' detailed description of the initial steps leading up to the dynamics model, involving the construction of the ODE, linearizing the system about the fixed point makes the text broadly accessible to the general reader. Similarly, adding some more description of the predictive model (eqn. 11 - 15) could improve the text's accessibility and the reader's appreciation for the model. This is especially relevant since the effects of sensorimotor delay and external perturbations, which are incorporated in the control and dynamics model, form a major contribution to this work. What do the matrices F, G, L, H, and K look like for the Drosophila model? Are there any differences between the model in Stenberg et al. (referenced in the paper) and the authors' model for predictive control? Are there any differences in the assumptions made in Stenberg et al. compared to the model presented in this work? The readers would likely also benefit from a figure showing the information flow in the model, and describing all the variables used in the predictive control model in eqn. 11 through eqn. 15 (analogous to Figure 1 in Stenberg et al. (2022)). Such a detailed description of the control and dynamics model would help the reader easily appreciate the assumptions made in modelling the effects of sensorimotor delay and external perturbations.

      Done

      Page 20:

      Eqn. 12: Should z(t+1) be z(t+T) instead?

      Similar comment for eqn. 14

      No: we made a mistake in (10a); there should be no (t+T) terms; all terms should be (t+1) terms to reflect a standard discrete-time difference equation.

      Eqn. 13: r(t) can be defined explicitly

      Done

      4.5 Generate joint trajectories of the complete model with perturbations Paragraph 2, line 2: Please read the previous comment

      \hat \theta and \theta_d were previously used interchangeably which is confusing. We have standardized our reference to the desired trajectory as \theta_d throughout the text.

      Original text - "Every 8 timesteps, we set :=...."

      Does this mean dis set to? If so, the motivation for this is not clear.

      We mean that \theta_d is set to be equal to \theta. We have replaced “:=” with “=” for clarity.

      General comments for the authors:

      Could the authors discuss the assumptions regarding Drosophila physiology implied in the control model?

      The control model is primarily included as a plausible functional element of the fly’s nervous system, and as such implies minimal assumptions on physiology itself. The main assumption, which is evident from the description of the model components, is that the fly uses proprioceptive feedback information to inform future movements.

      We have added clarifying text to the Results section:

      “While the model is inspired by neuroanatomy, its components do not strictly correspond to components of the nervous system --- the construction of a neuroanatomically accurate model is deferred to future work (see Discussion).”

      The authors acknowledge the absence of ground contact forces in the model. It is probably worth discussing how this simplification may affect inferences regarding the acceptable range of sensorimotor delay in generating realistic walking trajectories.

      We agree, and discuss how some of these assumptions affect the quantitative results in the section “Towards biomechanical and neural realism”. We replicate the relevant sentences below:

      “The inclusion of explicit leg-ground contact interactions would also make it harder for the model to recover when perturbed, because perturbations during walking often occur upon contact with the ground (e.g. the ground is slippery or bumpy).”

      The effects of other simplifications are also mentioned in the same section.

      Can the authors provide an insight into why the use of a second derivative of joint angles as the output of the trajectory generator () leads to more realistic trajectories (4.3.1 Model formulation, paragraph 1)?

      Does the use of a second-order derivative of joint angles lead to drift error because of integration?

      Could the distribution of θd produced be out of the domain due to drift errors? Could this affect the performance of the neural network model approximating the trajectory generator?

      We are not sure why the second derivative works better than the first derivative. It is possible that modeling the system as a second order differential equation gives the network more ability to produce complex dynamics. 

      As can be seen in the example time series in Figures 2 and 3 and supplemental videos, there is no drift error from integration, so it is unlikely to affect the performance of the neural network.

      What does the model's failure (quantified by a low KS score) look like in the context of fly dynamics? What do the joint angles look like for low values of KS score? Does the fly fall down, for example?

      Since the model primarily considers kinematics, a low KS score means that kinematics are unrealistic, e.g. the legs attain unnatural angles or configurations. Examples of this can be seen in videos 4-7 (linked from Appendix 1 of the paper), as well as in the bottom row of Fig. 5, panel A. Here, at 40ms of motor delay, L2 femur rotation is seen to attain values that far exceed the normal ranges. 

      We have added a small clarification in the caption of Fig.5 panel A:

      “low KS indicates that the perturbed walking deviates from data and results in unnatural angles

      (as seen at 40ms motor delay)” 

      We remark that since our simulations do not incorporate contact forces (as the reviewer remarks above, we simulate something like legs moving in the air for a tethered fly), the fly cannot “fall down” per se. However, if forces were incorporated then yes, these unrealistic kinematics would correspond to a fly that falls down or is no longer walking.

      Reviewer #2 (Recommendations For The Authors):

      L49: "Computational models of locomotion do not typically include delay as a tunable parameter, and most existing models of walking cannot sustain locomotion in the presence of delays and external perturbations". This remark confuses the reviewer.

      (1) If models do not "typically" include delay as a tunable parameter, this suggests that atypical models do. Which models do? Please provide references.

      Our initial phrasing was confusing. We meant to say that most models do not include delay, and some models do include delay as a fixed value (rather than a tunable value). We clarify in the updated text, which is replicated below:

      “Computational models of locomotion typically have not included delays as a tunable parameter, although some models have included them as fixed values (Geyer and Herr, 2010; Geijtenbeek et al., 2013).”

      (2) Has the statement that most existing models cannot sustain locomotion with delays been tested? If so, provide references. If not, please remove this statement or temper the language.

      Since most models don’t include delays, they cannot be run in scenarios with delays. We clarify in the updated text, which is replicated below:

      “Computational models of locomotion have not typically included delays. Some have included delay as a fixed value rather than a tunable parameter (Geyer and Herr, 2010; Geijtenbeek et al., 2013). However, in general, the impact of sensorimotor delays on locomotor control and robustness remains an underexplored topic in computational neuroscience.”

      L57: "two of six legs lift off the ground at a time" - Two legs are off the ground at any time, but they do not "lift off" simultaneously in the fruit fly. To lift off simultaneously, contralateral leg pairs would need to be 33% out of phase with one another, but they are almost always 50% out of phase.

      Thank you for pointing out this oversight. We have updated the text accordingly:

      “Flies walk rhythmically with a continuum of stepping patterns that range from tetrapod (where two of six legs are off the ground at a time) to tripod (where three of six legs are off the ground at a time)"

      L88: "a new model of fly walking" - The intention of the authors is to produce a model from which to learn about walking in the fly, is that correct? The reviewer has read the paper several times now and wants to be sure that this is the authors' goal, not to engineer a control system for an animation or a robot.

      Indeed, this is our goal. We were previously unclear about this, and have made text edits to clarify this — we provide a longer response for this in the public review above (see (1)).

      L126: "These desired phases are synchronized across pairs of legs to maintain a tripod coordination pattern, even when subject to unpredictable perturbations." - Does the animal maintain tripod coordination even when perturbed? In the reviewer's experience, flies vary their interleg coordination all the time. The reviewer would also expect that if perturbed strongly (as the supplemental videos show), the animal would adapt its interleg coordination in response. The author finds this assumption to be a weak point in the paper for the use of this disturbance exploring animal locomotion.

      We do not know exactly how flies may react to our mechanical perturbations. However, we may hypothesize based on past papers. 

      Couzin-Fuchs et al (2015) apply a mechanical perturbation to walking cockroaches. They find that that tripod is temporarily broken immediately after the perturbation but the cockroach recovers to a full tripod within one step cycle. 

      DeAngelis et al (2019) apply optogenetic perturbations to fly moonwalker neurons that drive backward walking. Flies slow down following perturbation, but then recover after 200ms (about 2-3 steps) to their original speed (on average). 

      Thus, we think it is reasonable to model a fly’s internal phase coupling to maintain tripod and for its intended speed to remain the same even after a perturbation. 

      We do agree with the reviewer that it is plausible a fly might also slow down or even stop after a perturbation and we do not model such cases. We have added some text to the discussion on future work:

      “Future work may also model how higher-level planning of fly behavior interacts with the lowerlevel coordination of joint angles and legs. Walking flies continuously change their direction and speed as they navigate the environment (Katsov et al, 2017; Iwasaki et al 2024). Past work shows that flies tend to recover and walk at similar speeds following perturbations (DeAngelis et al, 2019), but individual flies might still change walking speed, phase coupling, or even transition to other behaviors, such as grooming. Modeling these higher-level changes in behavior would involve combining our sensorimotor model with models for navigation (Fisher 2022) or behavioral transitions (Berman et al, 2016).”

      L136: "...to output joint torques to the physical model of each leg" - Is this the ultimate output of the nervous system? Muscles are certainly not idealized torque generators. There are dynamics related to activation and mechanics. The reviewer is skeptical that this is a model of neural control in the animal, because the computation of the nervous system would be tuned to account for all these additional dynamics.

      We agree with the reviewer that joint torques are not the ultimate output of the nervous system. We use a torque controller because it is parsimonious, and serves our purpose of creating an interpretable and modular locomotion model.

      We also agree that muscles are an important consideration — we make mention of them later on in the paper under the section “Toward biomechanical and neural realism”, where we state “Another step toward biological realism is the incorporation of explicit dynamical models of proprioceptors, muscles, tendons, and other biomechanical aspects of the exoskeleton.”

      Our goal is not to directly model neural control of the animal. We have introduced text clarifications to emphasize this — we provide a longer response for this in the public review above (see (2)).

      L143: "To train the network from data, we used joint kinematics of flies walking on a spherical treadmill..." This is an impressive approach, but then the reviewer is confused about why the kinematics of the model are so different from those of the animal. The animal takes longer strides at a lower frequency than the model. If the model were trained with data, why aren't they identical? This kind of mismatch makes the reviewer think the approach in this paper is too complicated to address the main problem.

      The design of our trajectory generator model is one of the simplest for reproducing the output of a dynamical system. It consists of a multilayer perceptron model that models the phase velocity and joint angle accelerations at each timestep. All of its inputs are observable and interpretable: the current joint angles, joint angle derivatives, desired walking speed, and phase angle. 

      We chose this model for ease of interpretability, integration with the optimal controller, and to allow for generalization across perturbations. Given all of these constraints, this is the best model of desired kinematics we could obtain. We note that the simulated kinematics do match real fly kinematics qualitatively (Figure 2A and supplemental videos) and are close quantitatively (Figure 2B and C). We speculate that matching the animals’ strides at all walking frequencies may require explicitly modeling differences across individual flies. We leave the design and training of more accurate (but more complex) walking models for future work.

      We add some further discussion about fitting kinematics in the discussion:

      “Although we believe our model matches the fly walking sufficiently for this investigation, we do note that our model still underfits the joint angle oscillations in the walking cycle of the fly (see Figure 2 and Appendix 3). More precise fitting of the joint angle kinematics may come from increasing the complexity of the neural network architecture, improving the training procedure based on advances in imitation learning (Hussein et al., 2018), or explicitly accounting for individual differences in kinematics across flies (Deangelis et al., 2019; Pratt et al., 2024).”

      Figure 2: The reviewer thinks the violin plots in Figure 2C are misleading. Joint angles could be greater or less than 0, correct? If so, why not keep the sign (pos/neg) in the data? Taking the absolute value of the errors and "folding over" the distribution results in some strange statistics. Furthermore, the absolute value would shroud any systematic bias in the model, e.g., joint angles are always too small. The reviewer suggests the authors plot the un-rectified data and simply include 2 dashed lines, one at 5.56 degrees and one at -5.56 degrees.

      These violin plots are averages of errors over all phases within each speed. We chose to do this to summarize the errors across all phase angle plots, which are shown in detail in Appendix 3 and 4.

      For the reviewer, we have added a plot of the raw errors across all phase angle plots in Appendix 5, E.

      L156: Should "\phi\dot" be "\phi"?

      We originally had a typo: we said “phase” when we meant “phase velocity”. This has been fixed. \phi\dot is correct.

      L160: "This control is possible because the controller operates at a higher temporal frequency than the trajectory generator...". This statement concerns the reviewer. To the reviewer, this sounds like the higher-level control system communicates with the "muscles" at a higher frequency than the low-level control system, which conflicts with the hierarchical timescales at which the nervous system operates. Or do the authors mean that the optimal controller can perform many iterations in between updates from the trajectory generator level? If so, please clarify.

      We mean that the optimal controller can perform many iterations in between updates from the trajectory generator level. The text has been clarified:

      “This control is possible because the controller operates at a higher temporal frequency than the trajectory generator in the model. The controller can perform many iterations (and reject disturbances) in between updates to and from the trajectory generator.”

      L225: "We considered two types of perturbations: impulse and persistent stochastic". Are these realistic perturbations? Realistic perturbations such as a single leg slipping, or the body movement being altered would produce highly correlated joint velocities.

      These perturbations are not quite realistic — nonetheless, we illustrate their analogousness to real perturbations in the subsequent text in the paper, and restrict our simulations to ranges that would be biologically plausible (see Appendix 7). We agree that realistic perturbations would produce highly correlated joint accelerations and velocities, whereas our perturbations produce random joint accelerations. 

      L265: "...but they are difficult to manipulate experimentally..." This is true, but it can and has been done. The authors should cite:

      Bässler, U. (1993). The femur-tibia control system of stick insects-A model system for the study of the neural basis of joint control. Brain Research Reviews, 18(2), 207-226. 

      Thank you for the suggestion, we have incorporated it into the text at the end of the referenced sentence.

      L274: "...since the controller can effectively compensate for large delays by using predictions of joint angles in the future". But can the nervous system do this? Or, is there a reason to think that the nervous system can? The reviewer thinks the authors need stronger justification from the literature for their optimal control layer.

      To clarify, this sentence describes a feature of the model’s behavior when no external perturbations are present. This is not directly relevant to the nervous system, since organisms do not typically exist in an environment free of perturbations — we are not suggesting that the nervous system does this.

      In response to the question of whether the nervous system can compensate for delays using predictions: we know that delays are present in the nervous system, perturbations exist in the environment, and that flies manage to walk in spite of them. Thus, some type of compensation must exist to offset the effects of delays (the reviewer themself has provided some excellent citations that study the effects of delays). In our model, we use prediction as the compensation mechanism — this is one of our central hypotheses. We further discuss this in the section “Predictive control is critical for responding to perturbations due to motor delay”.

      L319: "The formulation of a modular, multi-layered model for locomotor control makes new experimentally-testable hypotheses about fly motor control...". What testable hypotheses are these? The authors should explicitly state them. They are not clear to the reviewer, especially given the nonphysiological nature of the control system and the mechanics.

      A number of testable hypotheses are mentioned throughout the Discussion section:

      “Our model predicts that at the same perturbation magnitude, walking robustness decreases as delays increase. This could be experimentally tested by altering conduction velocities in the fly, for example by increasing or decreasing the ambient temperature (Banerjee et al, 2021).  If a warmer ambient temperature decreases delays in the fly, but fly walking robustness remains the same in response to a fixed perturbation, this would indicate a stronger role for central control in walking than our modeling results suggest.”

      “In our model, robust locomotion was constrained by the cumulative sensorimotor delay. This result could be experimentally validated by comparing how animals with different ratios of sensory to motor delays respond to perturbations. Alternatively, it may be possible to manipulate sensory vs. motor delays in a single animal, perhaps by altering the development of specific neurons or ensheathing glia (Kottmeier et al., 2020). If sensory and motor delays have significantly different effects on walking quality, then additional compensatory mechanisms for delays could play a larger role than we expect, such as prediction through sensory integration, mechanical feedback, or compensation through central control.”

      “we hypothesize that removing proprioceptive feedback would impair an insect's ability to sustain locomotion following external perturbations.”

      “We propose that fly motor circuits may encode predictions of future joint positions, so the fly may generate motor commands that account for motor neuron and muscle delays.”

      L323: "...and biomechanical interactions between the limb and the environment". In the reviewer's experience, the primary determinant of delay tolerance is the mechanical parameters of the limb: inertia, damping, and parallel elasticity. For example, in Ashtiani et al. 2021, equation 5 shows exactly how this comes about: the delay changes the roots and poles of the control system. This is why the reviewer is confused by the complexity of the model in this submission; a simpler model would explain why delays cannot be tolerated in certain circumstances.

      We were previously unclear about the goal of the model, and have made text edits to clarify this — we provide a longer response for this in the public review above (see (1)).

      L362: Another highly relevant reference here would be Sutton et al. 2023.

      Done

      L366: Szczecinski et al. 2018 is hardly a "model"; it is mostly a description of experimental data. How about Goldsmith, Szczecinski, and Quinn 2020 in B&B? Their model of fly walking has patterngenerating elements that are coordinated through sensory feedback. In their model, motor activation is also altered by sensory feedback. The reviewer thinks the statement "Models of fly walking have ignored the role of feedback" is inaccurate and their description of these references should be refined.

      Thank you for the suggestion; we have tempered the language and revised this section to include more references, including the suggested one — text is replicated below. 

      “Many models of fly walking ignore the role of feedback, relying instead on central pattern generators (Lobato-Rios et al., 2022; Szczecinski et al., 2018; Aminzare et al., 2018) or metachondral waves (Deangelis et al., 2019) to model kinematics. Some models incorporate proprioceptive feedback, primarily as a mechanism that alters timing of movements in inter-leg coordination (Goldsmith et al., 2020; Wang-Chen et al., 2023).”

      We remark that Szczecinski et al does include a model that replicates data without using sensory feedback, so we think it is fair to include.  

      L371: "...highly dependent on proprioceptive feedback for leg coordination during walking." What about Berendes et al. 2016, which showed that eliminating CS feedback from one leg greatly diminished its ability to coordinate with the other legs? This suggests that even flies depend on sensory feedback for proper coordination, at least in some sense.

      Interesting suggestion – we have integrated it into the text a little further down, where it better fits:

      “Silencing mechanosensory chordotonal neurons alters step kinematics in walking Drosophila (Mendes et al., 2013; Pratt et al., 2024). Additionally, removing proprioceptive signals via amputation interferes with inter-leg coordination in flies at low walking speeds (Berendes et al., 2016)”

      L426: "The layered model approach also has potential applications for bio-mimetic robotic locomotion.". How fast can this model be computed? Can it run faster than real-time? This would be an important prerequisite for use as a robot control system.

      The model should be able to be run quite fast, as it involves only

      (1) Addition, subtraction, matrix multiplication, and sinusoidal computation on scalars (for the phase coordinator and optimal controller)

      (2) Neural network inference with a relatively small network (for the trajectory generator) Whether this can run in real-time depends on the hardware capabilities of the specific robot and the frequency requirements — it is possible to run this on a desktop or smaller embedded device.

      We do note that the model needs to first be set up and trained before it can be run, which takes some time (see panel D of Figure 1).

      L432: "...which is a popular technique in robotics.". Please cite references supporting this statement.

      We have added citations: the text and relevant citations are reproduced below:

      “... which is a popular technique in robotics (Hua et al., 2021; Johns, 2021)

      Hua J, Zeng L, Li G, Ju Z. Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning. Sensors. 2021; 21(4):1278

      Johns E. Coarse-to-fine imitation learning: Robot manipulation from a single demonstration. In:

      2021 IEEE international conference on robotics and automation (ICRA) IEEE; 2021. p. 4613–4619

      L509: "We find that the phase offset across legs is not modulated across walking speeds in our dataset". This is a surprising result to the reviewer. Looking at Figure 6C, the reviewer understands that there are no drastic changes in coordinate with speed, but there are certainly some changes, e.g., L1-R3, L3-R1. In the reviewer's experience, even very small changes in interleg phasing can change the visual classification of walking from "tripod" to "tetrapod" or "metachronal". Furthermore, several leg pairs do not reside exactly at 0 or \pi radians apart, e.g., L1-L3, L2-L3, R1-R3, R2-R3. In conclusion, the reviewer thinks that setting the interleg coordination to tripod in all cases is a large assumption that requires stronger justification (or, should be eliminated altogether).

      We made a simplifying assumption of a tripod coordination across all speeds. The change in relative phase coordination across speeds is indeed relatively small and additionally we see little change in our results across forward speeds (see Figures 4B, 5C and 5D). 

      We have added text to clarify this assumption and what could be changed for future studies in the methods:

      “We estimate $\bar \phi_{ij}$ from the walking data by taking the circular mean over phase differences of pairs the legs during walking bouts. We find that the phase offset across legs is not strongly modulated across walking speeds in our dataset (see Appendix 2) so we model $\bar \phi_{ij}$ as a single constant independent of speed. In future studies, this could be a function of forward and rotation speeds to account for fine phase modulation differences.”

      L581: "of dimension...". Should the asterisk be replaced by \times? The asterisk makes the reviewer think of convolution. This change should be made throughout this paragraph.

      Good point, done.

      Figure 6: Rotational velocities in all 3 sections are reported in mm/s, but these units do not make sense. Rotational velocities must be reported in rad/s or deg/s.

      The rotation velocity of mm/s corresponded to the tangential velocity of the ball the fly walked on. We agree that this does not easily generalize across setups, so we have updated the figure rotation velocities in rad/s. 

      L619: The reviewer is unconvinced by using only 2 principal components of the data to compare the model and animal kinematics. The authors state on line 626 that the 2 principal components do not capture 56.9% of the variation in the data, which seems like a lot to the reviewer. This is even more extreme considering that the model has 20 joints, and the authors are reducing this to 2 variables; the reviewer can't see how any of the original waveforms, aside from the most fundamental frequencies, could possibly be represented in the PCA dataset. If the walking fly models looked similar to each other, the reviewer could accept that this method works. But the fact that this method says the kinematics are similar, but the motion is clearly different, leads the reviewer to suspect this method was used so the authors could state that the data was a good match.

      Our primary use of the KS metric was to indicate whether the simulated fly continues walking in the presence of perturbations, hence we limited the analysis of the KS to the first 2 principal components. 

      For completeness, we investigate the principal components in Appendix 9 and the effect of evaluating KS with different numbers of components in Appendix 10. 

      The results look similar across components for impulse perturbations. For stochastic perturbations, the range of similar walking decreases as we increase the number of components used to evaluate walking kinematics. Comparing this with Appendix 9 showing that higher components represent higher frequencies of the walking cycle, we conclude that at the edge of stability for delays (where sum of sensory and actuation delays are about 40ms), flies can continue walking but with impaired higher frequencies (relative to no perturbations) during and after perturbation. 

      We add text in the methods:

      “We chose 2 dimensions for PCA for two key reasons. First, these 2 dimensions alone accounted for a large portion of the variance in the data (52.7% total, with 42.1% for first component and 10.6% for second component)). There was a big drop in variance explained from the first to the second component, but no sudden drop in the next 10 components (see Appendix 9). Second, the KDE procedure only works effectively in low-dimensional spaces, and the minimal number of dimensions needed to obtain circular dynamics for walking is 2. We investigate the effect of varying the number of dimensions of PCA in Appendix 10.”

      (Note that we have corrected the percentage of variance accounted for by the principal components, as these numbers were from an older analysis prior to the first draft.)

      We also reference Appendix 10 in the results:

      “We observed that robust walking was not contingent on the specific values of motor and sensory delay, but rather the sum of these two values (Fig. 5E). Furthermore, as delay increases, higher frequencies of walking are impacted first before walking collapses entirely (Appendix 10).”

    1. eLife Assessment

      This valuable study presents a deep learning framework for predicting synergistic drug combinations for cancer treatment in the AstraZeneca-Sanger (AZS) DREAM Challenge dataset. However, the evidence on the generalizability of the model is incomplete, as part of the validation seems to be flawed by overfitting, and only a modest correlation between predictions and observations was observed in the second, more independent test set. The reported tool, DIPx, could be of use for personalized drug synergy prediction and exploring the activated pathways related to the effects of drug combinations.

    2. Reviewer #1 (Public review):

      The authors introduces DIPx, a deep learning framework for predicting synergistic drug combinations for cancer treatment using the AstraZeneca-Sanger (AZS) DREAM Challenge dataset. While the approach is innovative, I have following concerns and comments, and hopefully will improve the study's rigor and applicability, making it a more powerful tool in real clinical world.

      (1) In the abstract: "We trained and validated DIPx in the AstraZeneca-Sanger (AZS) DREAM Challenge dataset using two separate test sets: Test Set 1 comprised the combinations already present in the training set, while Test Set 2 contained combinations absent from the training set, thus indicating the model's ability to handle novel combinations". Test Set 1 comprises combinations already present in the training set, likely leading overfitting issue. The model might show inflated performance metrics on this test set due to prior exposure to these combinations, not accurately reflecting its true predictive power on unknown data, which is crucial for discovering new drug synergies. The testing approach reduces the generalizability of the model's findings to new, untested scenarios.

      (2) The model struggles with predicting synergies for drug combinations not included in its training data (showing only Spearman correlation 0.26 in Test Set 2). This limits its potential for discovering new therapeutic strategies. Utilizing techniques such as transfer learning or expanding the training dataset to encompass a wider range of drug pairs could help to address this issue.

      (3) The use of pan-cancer datasets, while offering broad applicability, may not be optimal for specific cancer subtypes with distinct biological mechanisms. Developing subtype-specific models or adjusting the current model to account for these differences could improve prediction accuracy for individual cancer types.

      (4) Line 127, "Since DIPx uses only molecular data, to make a fair comparison, we trained TAJI using only molecular features and referred to it as TAJI-M.". TAJI was designed to use both monotherapy drug-response and molecular data, and likely won't be able to reach maximum potential if removing monotherapy drug-response from the training model. It would be critical to use the same training datasets and then compare the performances. From the Figure 6 of TAJI's paper (Li et al., 2018, PMID: 30054332) , i.e., the mean Pearson correlation for breast cancer and lung cancer are around 0.5 - 0.6.

      The following 2 concerns had been include in the Discussion section which are great:

      (1) Training and validating the model using cell lines may not fully capture the heterogeneity and complexity of in vivo tumors. To increase clinical relevance, it would be beneficial to validate the model using primary tumor samples or patient-derived xenografts.

      (2) The Pathway Activation Score (PAS) is derived exclusively from primary target genes, potentially overlooking critical interactions involving non-primary targets. Including these secondary effects could enhance the model's predictive accuracy and comprehensiveness.

      Comments on revisions:

      The authors replied to my concerns but they did not address my comments/concerns. Especially for my concern #1: They trained and validated DIPx in the AstraZeneca-Sanger (AZS) DREAM Challenge dataset using two separate test sets: Test Set 1 comprised the combinations already present in the training set. Therefore, test Set 1 comprises combinations already present in the training set, likely leading overfitting issue but they claimed "There is no danger overfitting here" in their "Author Response" letter.

      All my other concerns are unchanged too.

    3. Reviewer #2 (Public review):

      Trac, Huang, et al used the AZ Drug Combination Prediction DREAM challenge data to make a new random forest-based model for drug synergy. They make comparisons to the winning method and also show that their model has some predictive capacity for a completely different dataset. They highlight the ability of the model to be interpretable in terms of pathway and target interactions for synergistic effects.

      In their revised manuscript, the authors attempt to address the points raised about a comparison to the full TAJI model and showing how molecular can be integrated into DIPx.

      (1) Their argument that "Using only molecular data allows for more convenient and intuitive inference of pathway importance compared to integrating multiple data types" is unconvincing. It's not clear how adding a data source here confounds pathway inference. They need to add examples.<br /> (2) They have revised the method of calculating p-values instead of bootstrapping them, so the new numbers appear a lot more meaningful now.<br /> (3) The performance on the O'Neill dataset shows the limitations of their training regime and shows the limits of the model in terms of picking new drug combinations. I would argue that is the very definition of overfitting, not being able to model any combination it has never seen.

    4. Reviewer #3 (Public review):

      Summary:

      Predicting how two different drugs act together by looking at their specific gene targets and pathways is crucial for understanding the biological significance of drug combinations. This study incorporates drug-specific pathway activation scores (PASs) to estimate synergy scores as one of the key advancements for synergy prediction. The new algorithm, Drug synergy Interaction Prediction (DIPx), developed in this study, uses gene expression, mutation profiles, and drug synergy data to train the model and predict synergy between two drugs. Comprehensive comparisons with another best-performing algorithm, TAIJI-M, highlight the potential of its capabilities.

      Strengths:

      DIPx uses target and driver genes to elucidate pathway activation scores (PASs) to predict drug synergy. This approach integrates gene expression, mutation profiles, and drug synergy data to capture information about the functional interactions between drug targets, thereby providing a potential biological explanation for the synergistic effects of combined drugs. DIPx's performance was tested using the AstraZeneca-Sanger (AZS) DREAM Challenge dataset, especially in Test Set 1, where the Spearman correlation coefficient between predicted and observed drug synergy was 0.50 (95% CI: 0.47-0.53). DIPx's ability to handle novel combinations, as evidenced by its performance in Test Set 2, indicates its potential for predictions of new and untested drug combinations.

      Weaknesses:

      While the DIPx algorithm shows promise in predicting drug synergy based on pathway activation scores, it's essential to consider its limitations. One limitation is that the availability of training data for specific drug combinations may influence its predictive capability. Further testing and experimental validation of the predictions in future studies would be necessary to fully assess the algorithm's generalizability and robustness.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors introduce DIPx, a deep learning framework for predicting synergistic drug combinations for cancer treatment using the AstraZeneca-Sanger (AZS) DREAM Challenge dataset. While the approach is innovative, I have the following concerns and comments which hopefully will improve the study's rigor and applicability, making it a more powerful tool in the real clinical world.

      We thank to the reviewer for recognizing the innovative aspects of DIPx and for sharing their valuable comments to further refine and strengthen our study. Those comments are carefully addressed in the following point-by-point response.

      (1) Test Set 1 comprises combinations already present in the training set, likely leading overfitting issue. The model might show inflated performance metrics on this test set due to prior exposure to these combinations, not accurately reflecting its true predictive power on unknown data, which is crucial for discovering new drug synergies. The testing approach reduces the generalizability of the model's findings to new, untested scenarios.

      From a clinical perspective, it is useful to test whether a known (previously tested) combination can work for a new patient, which is the purpose of Test Set 1. There is no danger overfitting here, because the test set is completely independent of the discovery set, so had we only discovered a false positive the test set would not have more than power than expected under the null. Predicting the effectiveness of unknown drug combinations (Test Set 2) is indeed an important and more challenging goal of synergy prediction, but it is statistically a distinct problem. The two test sets were previously designed by the AZS DREAM Challenge [PMID: 31209238].

      We have performed cross-validation on the dataset and demonstrated that the result of DIPx for Test Set 1 is not overfitting. Indeed, Figure 2—figure supplement 1 shows the 10-fold cross validation results for the training set. The median Spearman correlation between the predicted and observed Loewe scores across the 10 folds of cross-validation is 0.48, which is close to the correlation of 0.50 in Test Set 1 (red star).  We have added the cross-validation results to the “Validation and Comparisons in the AZS Dataset” section (page 4). 

      (2) The model struggles with predicting synergies for drug combinations not included in its training data (showing only a Spearman correlation of 0.26 in Test Set 2). This limits its potential for discovering new therapeutic strategies. Utilizing techniques such as transfer learning or expanding the training dataset to encompass a wider range of drug pairs could help to address this issue.

      We agree that this is an important limitation for the discovery of new therapeutic strategies. While transfer learning or expanding the training dataset could indeed help address this issue, implementing these approaches would require access to more comprehensive data, which is currently limited due to the scarcity of drug combination datasets. As more drug combination data become available in future, we plan to expand the training set to better cover a wider range of drug combinations and apply the transfer learning method to improve prediction accuracy. We have added a discussion on this in the Discussion Section.

      (3) The use of pan-cancer datasets, while offering broad applicability, may not be optimal for specific cancer subtypes with distinct biological mechanisms. Developing subtype-specific models or adjusting the current model to account for these differences could improve prediction accuracy for individual cancer types.

      We agree with the reviewer that the current settings of DIPx might not be optimal for specific cancers due to the cancer heterogeneity. However, building subtype-specific models is currently constrained by limitation of data availability, which in turn restricts their predictive power. In the Discussion section, we mention this as one of DIPx's limitations and suggest future improvements in cancer-specific models.

      (4) Line 127, "Since DIPx uses only molecular data, to make a fair comparison, we trained TAJI using only molecular features and referred to it as TAJI-M.". TAJI was designed to use both monotherapy drug-response and molecular data, and likely won't be able to reach maximum potential if removing monotherapy drug-response from the training model. It would be critical to use the same training datasets and then compare the performances. From Figure 6 of TAJI's paper (Li et al., 2018, PMID: 30054332) , i.e., the mean Pearson correlation for breast cancer and lung cancer is around 0.5 - 0.6.

      It is true that using monotherapy drug responses can enhance the performance of TAIJI as described in its original paper. In fact, TAIJI builds separate prediction modules for molecular data and monotherapy drug-response data, then combine their results to obtain the final prediction. In our paper we prioritize the exploration of molecular mechanisms in drug combinations while achieving performance comparable to the molecular model of TAIJI. DIPx can be expected to achieve similarly improved performance if we integrate the monotherapy drug response data using the same approach.

      My major concerns were listed in the public review. Here are some writing issues:

      (5) Some content in the Results section looks like a discussion: i.e, L129, "The extra information from the use of monotherapy data in TAJI is rather small, approximately 10% increase in the overall Spearman correlation, and, of course, we could also use such data in DIPx, so it is more convenient and informative to focus the comparisons on prediction based on molecular data alone."; L257, "As we discuss above, to get synergy, the two drugs in a combination theoretically should not have the same target. However, there is of course no guarantee that two drugs that do not share target genes can produce synergy. ".

      We have revised the texts and moved them to the Discussion section.  

      Reviewer #2 (Public Review):

      Trac, Huang, et al used the AZ Drug Combination Prediction DREAM challenge data to make a new random forest-based model for drug synergy. They make comparisons to the winning method and also show that their model has some predictive capacity for a completely different dataset. They highlight the ability of the model to be interpretable in terms of pathway and target interactions for synergistic effects. While the authors address an important question, more rigor is required to understand the full behavior of the model.

      We thank the reviewer for his/her time and effort in carefully reading the manuscript and acknowledging the significance of the study.

      Major Points

      (1) The authors compare DIPx to the winning method of the DREAm challenge, TAJI to show that from molecular features alone they retrain TAJI to create TAJI-M without the monotherapy data inputs. They mention that "of course, we could also use such data in DIPx...", but they never show the behaviour of DIPx with these data. The authors need to demonstrate that this statement holds true or else compare it to the full TAJI.

      This is similar to point 4 raised by Reviewer 1 regarding the exclusive use of molecular data in DIPx. In fact, TAIJI uses separate prediction modules for molecular data and drugresponse data which are then combined to obtain the final results. While integrating monotherapy drug data could enhance DIPx’s overall performance, for example, simply replacing TAIJI’s molecular model with DIPx in the full TAIJI to achieve comparable results, this is not the primary goal of DIPx. Our focus is on exploring the potential molecular mechanisms of drug action. Using only molecular data allows for more convenient and intuitive inference of pathway importance compared to integrating multiple data types.

      We have revised the related text with the discussion in section “Validation and comparisons in the AZS dataset” of the main text.

      (2) It would be neat to see how the DIPx feature importance changes with monotherapy input. For most realistic scenarios in which these models are used robust monotherapy data do exist.

      Indeed, some existing models incorporate monotherapy data into their predictions; for example, a recent study [PMID: 33203866] uses only monotherapy data to predict drug combinations. TAIJI, as discussed in Point 1, uses separate models for monotherapy and molecular data. In general, both data types can be integrated into a single prediction model, allowing for the consideration of feature importance from both. While such an approach can highlight features contributing to predictive performance, the significance of a monotherapy feature does not necessarily indicate the activated pathways of a synergistic drug combination, which is the primary focus of our study. For this reason, we have excluded monotherapy data from DIPx.

      (3) In Figure 2, the authors compare DIPx and TAJI-M on various test sets. If I understood correctly, they also bootstrapped the training set with n=100 and reported all the model variants in many of the comparisons. While this is a nice way of showing model robustness, calculating p-values with bootstrapped data does not make sense in my opinion as by increasing the value of n, one can make the p-value arbitrarily small.

      The p-value should only be reported for the original models.

      The reviewer is correct that we cannot compute the p-value by using an independent twosample test, because the bootstrap correlation values are based on the same data. However, p-values can still be computed to compare the two prediction models using the bootstrap. Theoretically, the bootstrap can be used to compute a confidence interval for the differential correlation in the test set. However, there is a close relationship between p-values and confidence intervals (see Pawitan, 2001, chapter 5; particularly p.134). Specifically, in this case, we compute the p-value as follows: (1) For each bootstrap, (i) compute the Spearman correlation between the predicted and observed scores in the test set for DIPx and TAIJI-M.

      Denote this by r1 and r2. (ii) compute the difference in the Spearman correlations d= (r1-r2). (2). Repeat the bootstrap n=100 times. (3). Compute the minimum of these two proportions:

      proportion of d<0 or proportion of d>0. (4). The two-sided p-value = 2x the minimum proportion in (3). To overcome the limited bootstrap sample size, we use the normal approximation in computing the proportions in (3). Note that in this method of computing the p-value, larger numbers of bootstrap replicates do not produce more significant results.

      We have re-computed the p-values using this method and added this text to the ‘Methods and Materials’ Section. 

      (4) From Figures 2 and 3, it appears DIPx is overfit on the training set with large gaps in Spearman correlations between Test Set 2/ONeil set and Test Set 1. It also features much better in cases where it has seen both compounds. Could the authors also compare TAJI on the ONeil dataset to show if it is as much overfit?

      The poor performance in ONeil dataset is not due to overfitting as such, but more likely due structural differences between the training and ONeil datasets.  (To investigate the overfitting issue, we have conducted a 10-fold cross validation in the AZS training set. The median correlation between the predicted and observed Loewe score across ten folds is 0.48, which is comparable to the median of 0.50 in the Test Set 1. Therefore, the model does not suffer from overfitting issue.  We have added this cross-validation result in the Section “Validation and Comparisons in the AZS Dataset” (page 4)).

      We have now obtained TAIJI’s results on the ONeil dataset. TAIJI-M relies on a gene-gene interaction network to integrate the indirect drug targeting effects. This approach limits its applicability to new datasets, as it can only predict synergy scores for drug combinations present in the training dataset. Among the set of drug combinations present in the training set (n = 1102), both DIPx and TAIJI-M perform poorly, with Spearman correlations between predicted and observed synergy scores of 0.09 and 0.05, respectively.

      (Additional note: The original version of TAIJI-M uses gene expression, CNV, mutation, and methylation data. However, there is no methylation data in the ONeil dataset, so we retrained TAIJI-M without the methylation features. According to the final report of TAIJI in the challenge (https://www.synapse.org/Synapse:syn5614689/wiki/396206), Guan et al. reported that methylation features do not contribute to prediction performance in the postchallenge analysis. This means that retraining TAIJI-M without the methylation data will not materially affect the comparison between DIPx and TAIJI-M on the ONeil dataset.)

      Minor Points:

      (5) Pg 4, line 130: Citation needed for 10% contribution of monotherapy.

      (6) The general language of this paper is informal at times. I request the authors to refine it a bit.

      We thank the reviewer for pointing this out. We have added the appropriate citation for the statement and carefully revised the text to make it more formal.

      Reviewer #3 (Public Review):

      Summary:

      Predicting how two different drugs act together by looking at their specific gene targets and pathways is crucial for understanding the biological significance of drug combinations. Such combinations of drugs can lead to synergistic effects that enhance drug efficacy and decrease resistance. This study incorporates drug-specific pathway activation scores (PASs) to estimate synergy scores as one of the key advancements for synergy prediction. The new algorithm, Drug synergy Interaction Prediction (DIPx), developed in this study, uses gene expression, mutation profiles, and drug synergy data to train the model and predict synergy between two drugs and suggests the best combinations based on their functional relevance on the mechanism of action. Comprehensive validations using two different datasets and comparing them with another best-performing algorithm highlight the potential of its capabilities and broader applications. However, the study would benefit from including experimental validation of some predicted drug combinations to enhance its reliability.

      Strengths:

      The DIPx algorithm demonstrates the strengths listed below in its approach for personalized drug synergy prediction. One of its strengths lies in its utilization of biologically motivated cancer-specific (driver genes-based) and drug-specific (target genes-based) pathway activation scores (PASs) to predict drug synergy. This approach integrates gene expression, mutation profiles, and drug synergy data to capture information about the functional interactions between drug targets, thereby providing a potential biological explanation for the synergistic effects of combined drugs. Additionally, DIPx's performance was tested using the AstraZeneca-Sanger (AZS) DREAM Challenge dataset, especially in Test Set 1, where the Spearman correlation coefficient between predicted and observed drug synergy was 0.50 (95% CI: 0.470.53). This demonstrates the algorithm's effectiveness in handling combinations already in the training set. Furthermore, DIPx's ability to handle novel combinations, as evidenced by its performance in Test Set 2, indicates its potential for extrapolating predictions to new and untested drug combinations. This suggests that the algorithm can adapt to and make accurate predictions for previously unencountered combinations, which is crucial for its practical application in personalized medicine. Overall, DIPx's integration of pathway activation scores and its performance in predicting drug synergy for known and novel combinations underscore its potential as a valuable tool for personalized prediction of drug synergy and exploration of activated pathways related to the effects of combined drugs.

      Weaknesses:

      While the DIPx algorithm shows promise in predicting drug synergy based on pathway activation scores, it's essential to consider its limitations. One limitation is that the algorithm's performance was less accurate when predicting drug synergy for combinations absent from the training set. This suggests that its predictive capability may be influenced by the availability of training data for specific drug combinations. Additionally, further testing and validation across different datasets (more than the current two datasets) would be necessary to assess the algorithm's generalizability and robustness fully. It's also important to consider potential biases in the training data and ensure that DIPx predictions are validated through empirical studies including experimental testing of predicted combinations. Despite these limitations, DIPx represents a valuable step towards personalized prediction of drug synergy and warrants continued investigation and improvement. It would benefit if the algorithm's limitations are described with some examples and suggest future advancement steps.

      We are grateful to the reviewer for the thoughtful and encouraging comments, and for the time and effort to read our manuscript. We have carefully addressed them in our revision.

      Reviewer #3 (Recommendations For The Authors):

      The authors could consider some of the recommendations below to further improve the DIPx algorithm and its application in personalized drug synergy prediction. Firstly, expanding the training dataset to include a broader range of drug combinations could improve the algorithm's predictive capabilities, especially for novel combinations. This would help address the observed decrease in performance when predicting drug synergy for combinations absent from the training set. This could help assess the robustness of the algorithm and provide a more comprehensive evaluation of its performance for untrained combinations to strengthen its application.

      We agree that expanding the training dataset with a broader range of drug combinations would likely improve performance. However, the vast number of possible combinations, along with the associated cost of the experiment, limits the availability of drug combination data. To increase the size of the training data, we could combine different studies, but data from different studies are often generated using different protocols and experimental settings, introducing biases that complicate the integration. As technology continues to advance, we anticipate that more standardized and comprehensive data will become available in the future, which will help address this issue.

      Furthermore, the authors may consider incorporating additional features or data sources, such as drug-specific characteristics, i.e., availability of the drug, to enrich the information utilized by the algorithm. This could potentially improve the accuracy of the predictions and provide a more holistic understanding of the factors contributing to drug synergy.

      Indeed, incorporating additional information such as monotherapy data and drug-specific characteristics, as in TAIJI’s approach, could enhance overall prediction performance. As discussed in Point 5 below, the current study is focused on exploring the potential molecular mechanisms of drug combinations, rather than optimizing overall prediction accuracy. However, in its application, it is natural to add the monotherapy or drug-specific information into the algorithm, as done in TAIJI.

      Finally, conducting experimental studies to validate the predictions generated by DIPx in laboratory-based cell lines would be essential to confirm its accuracy and reliability. This could involve a few drug IC50 experimental validations of predicted synergistic drug combinations and their associated pathway activations to strengthen the algorithm's clinical relevance. By considering these recommendations, the authors can further refine and advance the DIPx algorithm.

      We agree that laboratory-based validation, such as IC50 experiments for predicted synergistic drug combinations and pathway activations, would indeed strengthen the clinical relevance of the algorithm. We hope future studies can build on this work by incorporating this experimental validation.

      Below are my specific comments:

      Major comments:

      (1) The description of all the outputs of the DIPX algorithm is not clearly explained. It is unclear whether it provides only the Loewe score, the confidence score, the PAS score, or all of them. It is necessary to clarify the output of the proposed algorithm to guide the reader on what to expect while using it. The steps from PASs to synergy scores are not well explained.

      We apologize for the lack of clarity. Regarding the outputs of DIPx, for any triplet (drug A + drug B, cell line C), DIPx provides both the predicted Loewe score and the corresponding confidence score as the output. PASs are used as the input data for the random forest algorithm, which processes PASs into the synergy score. We do not provide the details in the manuscript, but refer to the article by Ishwaran H et al., (2021). We have revised the first paragraph of the 'A Pathway-Based Drug Synergy Prediction Model' section (page 3) and Figure 1 to improve the presentation of the method.

      (2) In Figure 1, the predicted Loewe score for the Capivasertib + Sapitinib combination is not provided. However, Figures 1e and 4a show the pathways with the highest contribution for this combination. What is the predicted Loewe score for the Capivasertib + Sapitinib combination?

      Figures 1e and 4a presents the pathways with the highest contribution for the combination which are identified based on the drug-combination data from 12 cell lines, not a single data point.

      We have added the median Loewe score (=7.6) across 12 cell lines in the test sets (Test 1 + Test 2) for the Capivasertib + Sapitinib combination in Figure 1e and reported related information for this combination in Supplementary Table S1. Additionally, we revised the 'Inference of the Mechanism of Action Based on PAS' section (page 7) to clarify the pathway importance inference.

      (3) In Figure 1d, the combination of doxorubicin + AZ12623380 is predicted to exhibit high Loewe synergy, with a confidence score of 0.33. It is important to provide details of this prediction, including the pathway predictions, and to explain why the model suggested high synergy. Although Figure 4f contains information, it seems to be listed for the observed Loewe score rather than the predicted score provided in Figure 1d. DIPx predicts the doxorubicin + AZ12623380 combination to be synergistic, while in Figure 4, it is labeled as a non-synergistic combination. It is necessary for the authors to clearly indicate which illustration represents the predicted outcome and which hypothesis is based on the observed Loewe score.

      In Figure 1d, we reported both predicted and observed Loewe score for the experiment (combination = doxorubicin + AZ12623380, cell line = SW900). Although the predicted score is high, a confidence score of 0.33 indicates that there is a low chance of the prediction is synergistic. And this is indeed confirmed by the non-synergistic observed score of -6, so it does not merit further investigation. This example highlights the value of the confidence score to supplement the predicted values. 

      (4) Figure 3 - The external validation using ONeil requires more rigorous analysis to understand the biological significance of the predictions. It is important to provide pathway activation scores and their potential mechanism of action predicted by the DIPx algorithm when working with a new dataset. Additionally, including the predictions of TAIJI-M on the ONeil dataset would be beneficial for comparing the performance of both algorithms on a new dataset.

      We have included an example of potential pathways related to the MK2206 + Erlotinib combination in the ONeil cohort, as inferred by DIPx, in the last paragraph of the 'Inference of the Mechanism of Action Based on PAS' section (page 9). In this example, we identify 'Metabolism by CYP Enzymes' as the most significant pathway associated with this combination, which aligns with previous studies that both MK2206 and Erlotinib are metabolized by the CYP enzyme families [PMID: 24387695].

      Regarding the prediction of TAIJI-M on the ONeil dataset, we have a similar request in question 4 from Reviewer 2, which we have carefully addressed above. Briefly, due to differences between two datasets, we retrained TAIJI-M without methylation data to enable prediction on the ONeil dataset. (As previously reported, methylation data did not significantly contribute to the results of TAIJI, and TAIJI-M can only predict synergy scores for drug combinations present in the training set.) Focusing on this subset of drug combinations, both TAIJI-M and DIPx perform poorly, with Spearman correlations of r=0.05 and r=0.09, respectively. The poor performance could be attributed to the limited overlap of drugs between the ONeil dataset and the AZS DREAM Challenge dataset.

      (5) TAIJI by Li et al., 2018 reported a high prediction correlation (0.53) in their study, while the modified version of TAIJI, TAJI-M, shows a lower prediction correlation in this study. The authors should clarify why the performance decreased when using the same dataset. Is it because only molecular data was used, excluding the monotherapy drug-response data? There is a spelling error in calling the algorithm - it is reported as TAIJI by Li et al., 2018, whereas this study calls it TAJI - an "I" is missing in TAIJI throughout the manuscript.

      Indeed, TAIJI-M has a lower prediction correlation (0.38) compared to the full TAIJI model (0.53), which includes the monotherapy data. Some studies such as [PMID: 33203866] even use only monotherapy data in prediction of drug combinations, suggesting the importance of monotherapy data in the drug-combination prediction. However, DIPx focuses on exploration of potential molecular mechanisms of drug combinations rather than overall prediction results, therefore, we exclude the monotherapy data from analysis. We have discussed on this in the 'Validation and Comparisons in the AZS Dataset' section (page 4).

      We thank the reviewer for pointing the spelling error for TAIJI; this has been corrected throughout the manuscript.

      (6) The authors should provide the predicted versus observed Loewe scores for all the combinations as a supplementary file. This would benefit the readers who want to replicate the results in the future. In the same way, including a sample output for the toy dataset on GitHub is required to assess the performance of the DIPx algorithm by a new user.

      All predicted and observed drug synergy scores are given in Supplementary Table S2. We also have already uploaded a simple example on our GitHub page, along with detailed instructions for users on how to run the method, including generating PAS and training the prediction model. Since we do not have permission to host data from the AZS DREAM Challenge and the ONeil datasets on our GitHub page, users can download these datasets separately and directly apply the provided code.

      (7) GitHub can include all the input and output data to reproduce the correlation plots in the manuscript. GitHub could also include the modified version of TAIJI-M and its corresponding input for comparison. The methods section should include how TAIJI was performed.

      We have uploaded all the codes and related data to the GitHub page to allow replication of all correlation plots in the manuscript. TAIJI-M represents the molecular model of the full TAIJI model. Both TAIJI-M and TAIJI are documented on the GitHub page of the original study. We have also included a link to the source code for TAIJI-M and TAIJI in the 'Data Availability' section.

      (8) Figure 5 - the data associated with this figure needs to be provided as supplementary listing the predicted values of Loewe scores for all the combinations.

      We report the associated data including the median of predicted and observed Loewe scores related to Figure 5c in Supplementary Table S2.

      Minor comments:

      (9) Abbreviations for the pathways are not included.

      We have included a list of abbreviations for all relevant pathways in Supplementary Table S5.

      (10) Line: 369. What is considered as bias correction? This needs to be explained.

      Bias correction refers to adjusting the original estimate of the Spearman correlation between the predicted and observed Loewe scores when there is a systematic difference between the estimates obtained from the bootstrap samples and the original correlation estimate. We revised the related text in page 13 to improve the explanation.

      (11) Line 364. Formulae or details for calculating actual predicted synergy (Ps) are missing.

      The predicted Loewe score, Ps, is the output of the regression random forest model. For simplicity, we do not describe the details in the manuscript, but refer to the description of the method article (Ishwaran H et al., 2021). We have revised the text accordingly.

    1. eLife Assessment

      The authors have generated important resources such as a reference dataset of early primate development by utilizing single-cell transcriptomic technology together with induced pluripotent stem cells (iPSCs) from four primate species: humans, orangutans, cynomolgus macaques, and rhesus macaques. By analyzing marker gene expression and cell types across species during undirected differentiation of iPSCs, the authors provide solid evidence that the transferability of marker genes decreases as the evolutionary distance between species increases. This work demonstrates the extended usage of iPSCs for broader fields, which will benefit several scientific communities including anthropology, comparative biology, and evolutionary biology.

    2. Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen, et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise be unobservable.

      Undirected differentiation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and humans and then employed to annotate other species. Jocher, Janssen, et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single-cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is a challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as, despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability, etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests a significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa, etc.

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them. Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses? Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. However some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present an important study on identifying and comparing orthologous cell types across multiple species. This manuscript focuses on characterizing cell types in embryoid bodies (EBs) derived from induced pluripotent stem cells (iPSCs) of four primate species, humans, orangutans, cynomolgus macaques, and rhesus macaques, providing valuable insights into cross-species comparisons.

      Strengths:

      To achieve this, the authors developed a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types across primates. This study makes a significant contribution to the field by advancing cross-species cell type identification.

      Weaknesses:

      However, several critical points need to be addressed.

      (1) Use of Liftoff for GTF Annotation

      The authors used Liftoff to generate GTF files for Pongo abelii, Macaca fascicularis, and Macaca mulatta by transferring the hg38 annotation to the corresponding primate genomes. However, it is unclear why they did not use species-specific GTF files, as all these genomes have existing annotations. Why did the authors choose not to follow this approach?

      (2) Transcript Filtering and Potential Biases

      The authors excluded transcripts with partial mapping (<50%), low sequence identity (<50%), or excessive length differences (>100 bp and >2× length ratio). Such filtering may introduce biases in read alignment. Did the authors evaluate the impact of these filtering choices on alignment rates?

      (3) Data Integration with Harmony

      The methods section does not specify the parameters used for data integration with Harmony. Including these details would clarify how cross-species integration was performed.

    1. eLife Assessment

      This is an important study that establishes how anti-sense oligonucleotides degrading a specific target protein called EMC10 can rescue neuronal function in models of chromosome 22.11.2 deletions. The authors use human iPSC-derived neurons and a mouse model to provide compelling data for the rescue of cellular and cognitive features of 22.11.2 phenotypes upon ASO regulation of EMC10. These pre-clinical data are of interest because they support reduction of ECM10 as a promising therapeutic strategy.

    2. Reviewer #1 (Public review):

      Summary:

      This is an important and very well-presented set of experiments following up on prior work from the lab investigating knock-down (KD) of EMC10 in restoration of neuronal and cognitive deficits in 22q11.2 Del models, including now both human iPSCs and a mouse model in vivo now with ASOs.

      The valuable progress in this current manuscript is the development of ASOs, and the proof of efficacy in vivo in mouse of the ASO in knock-down of EMC10 and amelioration of in vivo behavioral phenotypes.

      The experiments include: iPSC studies demonstrating elevations of EMC10 in a solid collection of paired iPSC lines. These studies also provide evidence of manipulation of EMC10 by overexpression and inhibition of miRNAs that exist in the 22q11 interval. The iPSC studies also nicely demonstrate rescue of impairments with KD of EMC10 in neuronal arborization as well as KCl induced neuronal activity. The major in vivo contributions reflect impressive demonstration of efficacy of two ASOs in vivo on both KD of EMC10 in vivo and through improvement in behavioral abnormalities in the 22q11 mouse in a range of different behaviors, including social behavior and learning behaviors.

      Overall, there are many strengths reflected in this study, including in particular the synergy between in vitro studies in human cell models and in vivo studies in the well characterized mouse model. The experiments are generally rigorously performed and well powered, and nicely presented. The claims with regard to the mechanisms of EMC10 elevations and the importance of restoration of EMC10 expression to neuronal morphology and behavior are well supported by the data. The work may be further supported in future studies, by investigation of rescue by ASOs of circuit dysfunction in vivo or ex vivo through electrophysiology in the mouse model. Also, in future studies, investigation of the mechanism by which EMC10, an ER protein involved in protein processing, may function in the observed neuronal abnormalities; however, these studies are clearly for future investigations.

      The potential impact of the work is found in the potential value of the ASO approach to the treatment of 22q11, or the pre-clinical evidence that knock-down of this protein may lead to some amelioration of cognitive symptoms. Overall, a very convincing and complementary set of experiments to support EMC10 KD as a therapeutic strategy.

      Review of revision: The authors have addressed the questions from the prior review.

    3. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate that both reviewers found our findings significant and recognized the strength of the presented data in demonstrating the potential value of ASO-mediated Emc10 expression modulation for treating 22q11.2DS. We are grateful for the reviewers' valuable input and constructive suggestions, which we believe have significantly strengthened our manuscript. Below, we address the main points and concerns, followed by our point-by-point responses:

      Evaluation of ASO-Mediated Emc10 Reduction: We appreciate the feedback and the opportunity to clarify this point. While we agree that ASO-mediated reduction of Emc10 should ideally be evaluated at both the mRNA and protein levels, we would like to emphasize that this was indeed performed in our study. Specifically, we conducted both qRT-PCR and Western Blot (WB) assays on the same animal cohort, focusing on the left and right hippocampus (rather than the PFC) following ASO injection (see Figure S11C and D). We prioritized the hippocampus for the WB assay because our primary behavioral assays and observed phenotypes in this study are strongly hippocampus-centric. This approach reflects our aim to investigate Emc10's role in the brain regions most relevant to the observed phenotypes. We hope this clarification addresses the reviewer’s concerns. While protein-level analysis would ideally complement RNA measurements, the Emc10 antibodies available were suboptimal in specificity and sensitivity, requiring substantial optimization. Additionally, challenges in obtaining sufficient high-quality protein from small regions like the hippocampus limited the use of protein detection as a standalone method. We plan to refine antibody protocols or explore alternative methods in future work. Notably, in all instances where we performed parallel protein and RNA measurements in both, mouse brain tissue and human cell lines, there was excellent concordance between the datasets, strongly suggesting that mRNA levels are a reliable indicator of Emc10 protein levels in our model.

      ASO Neuronal Uptake: While ASO uptake by neurons in the brain can vary considerably depending on factors such as ASO chemistry, delivery method, target brain region, and cell type, our targeted delivery approach, ASO design optimization, and ASO screening strategy were specifically tailored to achieve uniform and efficient uptake across hippocampal and cortical regions, in both neurons and glia. The figures included in our manuscript at both low and high magnification (see Figure S14A) clearly display the extensive (over 97%) overlap of ASO-positive cells (green signal) with cells expressing the neuronal marker NeuN (red signal). While quantifying ASO-positive cells in different brain regions could add value, the robust diffusion of ASO into neurons and glia is effectively demonstrated in the current figures and indirectly supported by the robust downregulation of Emc10 in ASO-treated animals as shown by qRT-PCR assays of hippocampal and cortical brain regions.

      Transcriptomic Data in Mutant EMC10 NGN2-iNs: Reduction in EMC10 levels is not expected to directly affect transcription or to broadly reorganize the differential gene expression profile of the Q6/Q5 patient/control NGN2-iN lines. Accordingly, our transcriptional profiling was not designed to assess the direct impact of EMC10 deficiency on gene expression but rather to serve as an indirect measure of cellular pathways affected by the reduction in EMC10 levels in the patient Q6 line. We aimed to identify genes and related functional pathways differentially expressed between the Q6/Q5 patient/control lines, where these expression differences are either abolished or significantly attenuated in Q6/EMC10<sup>HET</sup> or Q6/EMC10<sup>HOM</sup> NGN2-iNs.

      Statistical Analysis: We have meticulously reviewed all statistical analyses in the manuscript to ensure their appropriateness and adherence to established practices. For Figure S2, we acknowledge that the statistical details were not fully specified in the figure legend, though they are provided for each miRNA in Supplemental Table S2. In the revised manuscript, we ensured that the statistical methods and corresponding values are clearly indicated for each comparison.

      We are confident that the revisions outlined above, along with the point-by-point responses provided below, will significantly strengthen our manuscript and address all the concerns raised by the reviewers. We would like to express our sincere thanks to the reviewers for their valuable feedback and constructive suggestions.

      Reviewer #1 (Recommendations For The Authors):

      My comments here are generally limited to minor comments that reflect possible small additions or edits to the manuscript:

      (1) Panel 1A is very small. Please consider making that bigger as space permits.

      We have increased the panel size of Figure 1A in the revised manuscript to improve its visibility and clarity.

      (2) Are you able to identify the dot that represents EMC10 in panel 1C? I understand that EMC10 is represented in Supplementary Figure 4A.

      We appreciate the reviewer's observation. In Figure 1C, the volcano plot depicts differentially expressed miRNAs in the Q5/Q6 neuronal samples, as identified through miRNA-sequencing. Since EMC10, as a protein-coding gene and a downstream target of miRNA dysregulation, is not included in this analysis. However, as the reviewer correctly notes, EMC10 gene expression is represented in the volcano plot in Supplementary Figure 4A, which displays differentially expressed genes identified through bulk RNA-seq analysis of the same neuronal samples. To avoid any confusion, we have clarified the title of Figure 1C to emphasize that it represents miRNA expression changes.

      (3) With regard to studies using iPSC. Some of the studies are executed across multiple distinct pairs and some are only done in a single pair. Overall, while results are coherent and often complimentary, would it be valuable for the authors to comment on experiments where studies in multiple pairs seemed particularly important, or others wherein it was less important?

      We thank the reviewer for this insightful question regarding our use of multiple versus single hiPSC pairs. Our investigation began with the Q5/Q6 sibling (dizygotic twin) pair, which shares the most similar genetic background. This minimized the impact of confounding genetic factors and provided a robust foundation for testing our hypothesis that EMC10 upregulation, driven by miRNA dysregulation, is a key consequence of the 22q11.2 deletion in human neurons, thus validating our previous findings from the Df(16)A<sup>+/-</sup> mouse model (Stark et al., 2008; Xu et al., 2013). To ensure the generalizability of our findings, we incorporated additional hiPSC lines from another sibling pair as well as a case/control pair, demonstrating that EMC10 upregulation is a consistent feature of 22q11.2DS. Subsequently, we focused on the well-matched Q5/Q6 pair for detailed morphological, functional, and genetic rescue experiments. This approach allowed us to perform in-depth studies while controlling for potential genetic confounders. By using both multiple and single hiPSC pairs, we balanced the need for generalizable findings with the practical considerations of conducting technically complex and resource-intensive experiments. This strategy enabled us to provide both broad and detailed insights into the mechanisms underlying 22q11.2DS. We have modified the introductory paragraph of the Results section to better highlight this issue.

      (4) While the majority of the experiments seem sufficiently powered to test the hypothesis in question in the iPSC studies, Figure 2B raises the question if the study replicates here were underpowered, and perhaps the authors might consider mentioning this, although this is a very minor comment.

      We thank the reviewer for raising this point. We acknowledge that the statistical power to detect a significant difference in pre-miR-485 levels in Figure 2B may be limited due to the relatively small sample size and the inherent variability in hiPSC-derived neuronal cultures. However, it is important to emphasize that the functional impact of miRNAs is primarily mediated by their mature transcript forms. Our miRNA-seq data (Supplementary Table 2 and Figure S2) did not show significant alterations in the levels of mature miR-485-5p or miR-485-3p. This finding aligns with the reported expression pattern of miR-485 in hiPSC-derived neurons, where relatively low levels are observed in early neuronal development, with increased expression occurring in older, more mature neurons (Soutschek et al. 2023; https://ethz-ins.org/igNeuronsTimeCourse/ database from the Institute of Neurogenomics, ETH Zurich). This database provides a valuable resource for examining gene expression dynamics during human neuronal differentiation. Given that our hiPSC-derived neurons were analyzed at a relatively early developmental stage (DIV8 for these experiments), it is likely that miR-485 expression had not yet reached levels sufficient to reveal significant differences. While we acknowledge the potential limitation in statistical power for detecting subtle changes in pre-miR-485 levels, the combined evidence suggests that miR-485 may not be a significant contributor to the observed phenotypes at this developmental stage.

      A paragraph has been added in the corresponding Results section to address this issue.

      (5) There are a few situations where the authors could help out the reader a little bit by providing more labels on the figures directly. For example: in Figure 2, there are expression levels, over-expression, and inhibition of miRNA but the X-axis is named with similar labels for the miRNAs in question for each of these distinct experiments. If the authors want to help the reader, they may consider labeling these panels with a descriptive title to reflect the experiment being done or use more descriptive terms in the X-axis panels. Again, this is minor. Similarly, in Figure 5, it might be helpful for the authors to help out the reader again with more labels on the panels, such as in Figures 5B, 5C, and 5D. Would they consider labeling these panels, HPC, PFC, SSC with the brain location as they did in Figure 4?

      We thank the reviewer for these helpful suggestions to improve the clarity of our figures. We have implemented the proposed changes. In Figure 2C-E, we have added specific titles to the panels to clearly distinguish between the different experimental conditions such as miRNA overexpression and inhibition. Similarly, in Figure 5, we labeled panels 5B, 5C, and 5D with the brain regions analyzed (HPC, PFC, SSC) to match the labeling used in Figure 4. We believe these revisions enhance the readability and overall interpretability of the figures, making it easier for readers to follow the experiments and results.

      (6) Figure 3: There is some overshoot of the data in EMC10 homozygous null, in panel 3E, and also, overshoot of the het in panel 3H. Would there be value in the authors commenting on the potential basis for this in the discussion? Some issues are minor, such as the lack of electrophysiological analysis of circuits in vivo or in ex vivo slices that may further support the proposed rescue.

      The reviewer correctly highlights the observation in Figures 3E and 3H, where the number of branch points in the Q6/EMC10<sup>HOM</sup> line exceeds wildtype levels and the calcium response in the Q6/EMC10<sup>HET</sup> and Q6/EMC10<sup>HOM</sup> lines surpasses that of the control. This overshoot is indeed intriguing and warrants discussion. EMC10 is part of the ER Membrane Complex (EMC), which plays a critical role in the proper folding and localization of various membrane proteins, including neurotransmitter receptors and ion channels such as voltage-gated calcium channels (Chitwood et al., 2018; Shurtleff et al., 2018; Chitwood and Hegde, 2019). In the context of the 22q11.2 deletion, EMC10 dysregulation may disrupt the proper localization of these proteins at the synapse, affecting both dendritic morphology and calcium signaling. The precise basis of this overshoot remains unclear. The overshoot may result from a dosage-sensitive inhibitory effect of Emc10, where both reduced and increased expression alter normal neuronal processes, with excessive responses potentially triggered upon gene restoration by the mutant system’s adaptation to dysfunction, leading to altered receptor sensitivity or signaling dynamics. This underscores the critical importance of precise Emc10 expression for proper neuronal development and function, in line with previous findings suggesting that EMC10 plays an auxiliary or modulatory role in EMC function. A short comment on the potential basis for this overshoot has been added in the corresponding Results section of the manuscript. Regardless of the underlying mechanisms, these findings emphasize the importance of precise titration of ASO constructs, rigorous gene dosage controls, and thorough analysis of context-specific responses to ensure both efficacy and safety in clinical applications.

      We also agree with the reviewer that electrophysiological studies, particularly in the 22q11.2 deletion mouse model, would provide valuable insights into the impact of EMC10 modulation by ASOs on neuronal activity and circuit function at the in vivo and ex vivo levels. Incorporating such experiments into future studies will allow us to assess synaptic transmission and plasticity, contributing to a more comprehensive understanding of the therapeutic potential of ASO-mediated EMC10 modulation in 22q11.2DS.

      (7) Did the authors take out the behavior studies further than 9 weeks? Would the authors consider commenting on what they speculate might be the duration of the treatment effect? For both mice and definitely humans.

      We thank the reviewer for raising the important question regarding the duration of the ASO treatment effect, which is crucial for translating our findings into clinically relevant therapies. While behavioral studies beyond 9 weeks were not conducted in this study, our in vivo experiments and findings from prior publications (detailed below) enable an informed speculative assessment.

      We utilized 2'-O-methoxyethyl (2'-MOE) modified ASOs, known for their enhanced binding affinity, nuclease resistance, and increased metabolic stability. In our in vivo post-injection screening of ASOs (Figure S13C), we predicted that Emc10 expression levels return to normal WT levels (~T100%) approximately 26 weeks post-treatment in Emc10<sup>ASO</sup> (#1466182) treated mice. This prediction is supported by our Emc10 expression profiles across various brain regions, which demonstrate robust repression of Emc10 lasting up to 10 weeks post-administration (Figure 6D-F). While these findings suggest that the treatment effect in our model could extend significantly beyond 10 weeks following a single ASO injection, further empirical validation is required through extended follow-up studies. Encouragingly, long-term effects of 2'-MOE ASOs have been observed in other neurological disorders (Kordasiewicz et al., 2012; Scoles et al., 2017; Finkel et al., 2017; Darras et al., 2019). However, factors such as ASO distribution, target cell turnover, and disease-specific pathophysiology could influence the duration of the effect. To address these uncertainties, we have added a paragraph in the Discussion section emphasizing the need for additional studies, including extended follow-up periods and eventual clinical trials, to determine the specific duration of effect for our Emc10<sup>ASO</sup> constructs in treating 22q11.2DS.

      Reviewer #2 (Recommendations For The Authors):

      (1) It is acknowledged that the iPSC-derived cells in Figure 1 are no longer progenitors, but differentiation markers for astrocytes and glia are also needed in Figure 1b to establish that equal rates of differentiation have occurred across genotypes.

      We thank the reviewer for raising this important point about ensuring equal rates of differentiation across genotypes. As the reviewer notes, we employed a well-established protocol for directed differentiation of hiPSCs into cortical neurons using a combination of small molecule inhibitors, as previously described by Qi et al. (2017). This protocol has been extensively validated and is known to robustly generate cortical neurons while actively suppressing glial differentiation, as evidenced by the lack of upregulation of glial markers such as GFAP, AQP4, or OLIG2 in the original study. Given the established neuronal specificity of this protocol and our focus on neuronal phenotypes, we prioritized the confirmation of successful neuronal differentiation using the established neuronal markers TUJ1 and TBR1. Therefore, additional markers for astrocytes and glia are not included in this figure, as we did not expect significant glial differentiation under these conditions. A sentence has been added in the corresponding Results section to address this issue.

      (2) For the RNA-seq experiments outlined in Figures 3J and K, a more comprehensive analysis is needed of the genes disrupted in the parental Q6 line relative to the het and homo lines. What percent are rescued, unaffected, vs uniquely disrupted?

      Reduction in EMC10 levels is not expected to directly affect transcription or broadly reorganize the gene expression profile of the Q6/Q5 NGN2-iN lines. Our transcriptional profiling was not designed to assess the direct impact of EMC10 deficiency on gene expression but rather to measure the cellular pathways affected by reduced EMC10 in the patient Q6 line. We identified genes differentially expressed between the Q6 (patient) and Q5 (control) lines, whose expression differences were either abolished or significantly attenuated ("rescued") in the Q6/EMC10<sup>HET</sup> or Q6/EMC10<sup>HOM</sup> lines. In the Q6/EMC10<sup>HET</sup> line, 237 DEGs (6%) were rescued, while in the Q6/EMC10<sup>HOM</sup> line, 382 DEGs (11%) were rescued. Importantly, further analysis revealed 103 shared rescued DEGs in these lines, which was statistically significant (enrichment factor = 1.7; p < 0.0001, based on a hypergeometric test). We added a new figure panel (Figure 3L) to visualize the significant overlap of rescued DEGs from the Q6/EMC10<sup>HET</sup> and Q6/EMC10<sup>HOM</sup> lines. This overlap suggests these genes play a critical role in biological pathways impacted by EMC10 levels, particularly in nervous system development, as indicated by our functional annotation analysis. We also performed protein-protein interaction (PPI) network analysis to explore the functional relationships among these 103 shared DEGs (Figure S8). Future studies will further investigate these gene sets to gain deeper insights into the molecular mechanisms underlying 22q11.2DS and the role of EMC10.

      (3) The authors claim that 50% EMC10 loss in adult mice is safe and should be toned down. EMC10 knockout mice have motor, anxiety, and social phenotypes. It would be unique amongst highly dosage-sensitive genes (MeCP2, CDKL5, TCF4, FMR1, etc.) for there to only be a neurodevelopmental component. In all those cases, and others, the effects of over and under-expression are reversible into adulthood. Establishing the range in adults is critical to establishing therapeutic utility. Absent a detailed examination of non-cognitive phenotypes, this claim cannot be made.

      The reviewer raises an important point about the potential effects of EMC10 reduction in adult mice and the need to establish a safe therapeutic window by evaluating both cognitive and non-cognitive phenotypes. We agree that such a comprehensive evaluation is critical for assessing the safety and translational potential of Emc10-targeting therapies. While the International Mouse Genotyping Consortium reported motor and anxiety phenotypes in homozygous Emc10 knockout mice, these data are unpublished and based on a relatively small number of animals. Furthermore, in our previous work (Diamantopoulou et al., 2017), we demonstrated that complete Emc10 loss does not impair cognition or social behavior, as assessed by prepulse inhibition (PPI), working memory (WM), and social memory (SM) assays (see Figure 3A-D; Diamantopoulou et al., 2017). Additionally, heterozygous Emc10 mice, which exhibit a ~50% reduction in Emc10 expression similar to that achieved with our ASO treatment, showed no evidence of motor deficits or anxiety-like behavior. Specifically, Emc10<sup>+/-</sup> mice displayed locomotor activity comparable to WT mice in the open field (OF) test (Figure S4A, Diamantopoulou et al., 2017). Moreover, genetic normalization of Emc10 expression in Df(16)A<sup>+/-</sup> mice demonstrated no signs of anxiety-like behavior, as assessed by the OF test (Figure S4A) and elevated plus maze (EPM) (Figure S4B; Diamantopoulou et al., 2017). To further support these findings, we have added new data to the current manuscript (see Figure S10J) showing that TAM treatment-mediated restoration of Emc10 levels in the brain of adult Df(16)A<sup>+/-</sup> mice did not affect the time that mutant mice spent in the center area of the OF (Fig. S10J), suggesting that Emc10 reduction does not influence anxiety-related behavior. These results suggest that a 50% reduction in EMC10 expression is unlikely to result in motor or anxiety-like phenotypes in adult mice. Finally, as noted in the manuscript, in addition to prior findings from animal models, a substantial number of relatively rare LoF variants or potentially damaging missense variants have been identified in the human EMC10 gene among likely healthy individuals in gnomAD, a database largely devoid of individuals known to be affected by severe neurodevelopmental disorders (NDDs).

      Nevertheless, the Discussion has been revised to underscore the importance of establishing a more detailed safety profile, including non-cognitive phenotypes, to fully validate the therapeutic potential of Emc10-targeting approaches. It also highlights the need for future studies to expand on these evaluations, addressing this critical aspect and laying a stronger foundation for advancing these findings into clinical drug development

      (4) Supplemental Figure 10: The protein validation of Emc10 knockout following tamoxifen injection needs to be validated in all brain regions, not just the PFC. This is particularly important as the rest of the paper focuses on HPC-mediated phenotypes.

      First, we want to emphasize that we conducted both qRT-PCR and WB assays on the same animal cohort, specifically examining the left and right hippocampus following ASO injection (see Figure S11C and D). This approach is crucial, given the central role of hippocampus in the phenotypes investigated in our ASO-mediated Emc10 knockdown experiments.

      The reviewer raises an important point regarding the validation of EMC10 reduction at the protein level across all relevant brain regions using the Emc10 conditional knockout strain. We agree that such validation would ideally confirm the efficacy of our tamoxifen-induced knockout model comprehensively. However, we hope the reviewer appreciates that obtaining sufficient high-quality protein for WB analysis from smaller brain regions like the hippocampus poses a significant technical challenge. This difficulty is further compounded by the need to reserve the same samples for qRT-PCR to ensure consistency between mRNA and protein measurements. Importantly, our data from ASO-mediated Emc10 knockdown experiments (Figures S11C-D) demonstrate a clear and consistent correlation between reductions in Emc10 mRNA and protein levels in both the left and right hippocampus. Furthermore, in our constitutive Emc10-knockout mouse model (Diamantopoulou et al., 2017; see Figure S1A-B), we observed a strong agreement between mRNA and protein levels, supporting the reliability of mRNA data as a proxy for EMC10 protein levels in our experiments. Importantly, in all instances where we performed parallel protein and RNA measurements in human cell lines, there was excellent concordance between the datasets. Thus, while we acknowledge the limitations of relying primarily on mRNA data, we are confident that the Emc10 mRNA expression data in Figure S10 accurately reflect protein-level changes across brain regions in our conditional knockout model. To address this concern more fully in the future, we are working to refine antibody detection and optimize our protein extraction protocols to enable more routine and precise protein-level validation across smaller brain regions. We appreciate the reviewer’s feedback and will continue to refine our methodologies to strengthen the robustness of our findings.

      (5) Figure 3: 1 way ANOVA would be more appropriate to analyze the data in B-G than t-tests.

      We appreciate the suggestion of the reviewer. As mentioned above, we carefully selected statistical tests appropriate for each analysis. For Figure 3B-G, we chose to use pairwise t-tests to address specific hypotheses regarding the disease phenotype and rescue effects. This approach is consistent with prior experimental studies in the field, including our own (e.g., Xu et al., 2013; Figure 7H-I). Importantly, most of our t-tests yielded highly significant results (p < 0.001 or p < 0.01), reinforcing the robustness of our findings.

      (6) Figure 5-6: Protein data is needed to complement the mRNA knockdown data.

      We agree with the reviewer on the importance of protein-level validation to complement the mRNA knockdown data. As mentioned in our response to Reviewer’s Comment (4), in all instances where we performed parallel protein and RNA measurements, either in mouse brain or human cell lines, we observed excellent concordance between the datasets. This supports the reliability of our mRNA data as a proxy for protein changes. Nevertheless, we acknowledge the value of including protein validation in future experiments and will consider incorporating it to further strengthen our findings.

      (7) The use of additional phenotypic measures is applauded in Figure 6, however, to appropriately interpret the data more is needed. Shao et al 2021 (Figure S9) show data from the International Mouse Genotyping Consortium claiming EMC10 KO mice have gait, activity, and anxiety phenotypes. All of these parameters could impact the SM assay and the y-maze assay. Changes in SM interaction time could be linked to anxiety or motor impairments, but interpreted as cognitive deficits because these symptoms were not assessed. At a minimum, discussion is needed about this limitation, as well as the inclusion of distance explored in the SM and Y-maze assays.

      We thank the reviewer for their insightful comment regarding the potential influence of locomotor, gait, or anxiety phenotypes on the observed deficits in the SM and Y-maze assays. The behavioral phenotypes reported for Emc10 knockout mice by the International Mouse Genotyping Consortium (https://www.mousephenotype.org/data/genes/MGI:1916933) were limited to homozygous female mice and based on a small sample size (4–6 females) compared to a larger WT control group. Moreover, these data are unpublished and thus challenging to evaluate fully. Importantly, no abnormal behaviors were reported for Emc10 heterozygous knockout mice in these datasets. Additionally, the claim by Shao et al. (2021) regarding cognitive impairments in Emc10 knockout mice based on our previous work (Diamantopoulou et al., 2017) is inaccurate.

      Our analysis of both the constitutive Emc10 knockout model (Diamantopoulou et al., 2017) and the current conditional Emc10 heterozygous knockout model consistently demonstrates that Emc10 reduction does not affect locomotor activity or anxiety-like behavior. In our earlier characterization of constitutive heterozygous Emc10 knockout mice (Emc10<sup>+/-</sup>), we observed no signs of anxiety-like behavior or motor impairments in OF assays (see Figure 2A-B and Figure S4A, Diamantopoulou et al., 2017). Similarly, results from Df(16)A<sup>+/-</sup> mice with genetically normalized Emc10 expression [Df(16)A<sup>+/-</sup>; Emc10<sup>+/-</sup>] also showed no indications of anxiety-like behavior or locomotor changes in the OF and EPM assays (see Figure S4A-B, Diamantopoulou et al., 2017). Consistent with these findings, our current data from Df(16)A<sup>+/-</sup> mice with conditional Emc10 reduction in the brain show no significant differences in locomotor activity and anxiety-related measures as assessed by OF assays (Figure S10J). Furthermore, total arm entries in Y-maze assays conducted in Df(16)A<sup>+/-</sup> mice treated with Emc10 ASOs were comparable to controls (Figures S14C and G-H), providing additional support for the conclusion that locomotor activity remains unaffected in these models.

      We further appreciate the reviewer’s suggestion that changes in social interaction time during the SM assay could be influenced by anxiety or motor impairments. However, we consider this scenario unlikely in our model. Interaction times during the first trial of the SM assay, which measures general social interest, are comparable between Df(16)A<sup>+/-</sup> mice with reduced Emc10 expression (either genetically or through ASO treatment) and WT controls (see Figures 4E, 5E, and S10G). These findings indicate that our mouse models do not exhibit inherent difficulties in initiating social interaction, as might be expected if motor impairments or heightened anxiety were present. Reduced social interaction is commonly used as a behavioral marker for anxiety in rodent studies (reviewed by Bailey and Crawley, Anxiety-Related Behaviors in Mice, 2009). “Anxious” mice typically exhibit decreased social interaction, spending less time engaging with other mice compared to non-anxious counterparts. However, the specific deficit we observe in the second trial of the SM assay—when mice are reintroduced to a familiar juvenile—is indicative of impaired social recognition memory, as previously documented for Df(16)A<sup>+/-</sup> mice (Piskorowski et al., 2016; Donegan et al., 2020). This deficit is distinct from the general social avoidance typically associated with heightened anxiety.

      Based on our comprehensive assessment of locomotor activity, anxiety-related behaviors, and social interaction, we conclude that the observed rescue of social memory and spatial memory deficits in mice with reduced Emc10 expression is most likely due to improved cognitive function rather than alterations in motor or anxiety-related domains.

      (8) For ASO optimization experiments, it is not sufficient to claim robust uptake. A quantitative measure is needed using the PO antibody showing what percentage of cells were positive for the ASO. Since the contention is that only Emc10 in excitatory neurons is important, it would be helpful if this also included a breakdown of ASO uptake in excitatory and inhibitory neurons and astrocytes.

      We thank the reviewer for highlighting the importance of quantifying ASO uptake and assessing cell-type specificity. To address this, we have added new data to the panel, as shown in the high-magnification images in Figure S14A. These images provide evidence that a large majority of NeuN-positive neurons exhibit a strong ASO signal. Specifically, we observed widespread ASO uptake (green) that extensively colocalized with the neuronal marker NeuN (red) in both the hippocampus and prefrontal cortex. Quantitative analysis of this overlap indicates that over 97% of NeuN-positive neurons were ASO-positive, demonstrating efficient neuronal uptake. This robust neuronal uptake aligns with the significant normalization of Emc10 levels and the behavioral improvements observed in ASO-treated Df(16)A<sup>+/-</sup> mice, further supporting the functional efficacy of our approach in modulating Emc10 expression within the relevant neuronal populations. Overall, the observed ASO uptake in neurons, as demonstrated by IHC, combined with RNA assays and the behavioral improvements in treated mice, strongly supports the efficacy of our approach in targeting Emc10 expression in the intended neuronal populations.

      (9) An interpretation is needed in Figure S3 as to why ~50% of the pathways increased are also present on the decreased list. Ie. G1/transition, viral reproductive process, pos regulator of cell stress, etc. 4/10 GO terms are present in both increased and decreased groups in A and 7/10 in B.

      We thank the reviewer for pointing out the overlap between pathways enriched in both the upregulated and downregulated miRNA groups in Figure S3. This overlap likely reflects the complex nature of miRNA regulation, where individual miRNAs can target multiple genes within a pathway, and single genes can be regulated by multiple miRNAs, sometimes with opposing effects (reviewed in Bartel, 2009; Bartel, 2018). For example, in the “G1/S transition” pathway, upregulated miRNAs such as miR-92a-3p, miR-92b-3p, and miR-34a-5p may promote the transition by targeting cell cycle regulators like FBXW7, CDKN1C, and CDK6 (Zhou et al., 2015; Zhao et al., 2021; Oda et al., 2024). Conversely, downregulated miRNAs such as miR-143-3p and miR-200b are known to suppress the transition by targeting genes such as HK2 and GATA-4 (Zhou et al., 2015; Yao et al., 2013). Our analysis identified overlapping predicted target genes for both upregulated and downregulated miRNAs, supporting the notion that many genes are subject to complex regulation by multiple miRNAs with potentially synergistic or antagonistic effects. Thus, the enrichment of certain GO terms in both groups likely reflects this intricate interplay of miRNA-mediated gene regulation. Future investigations focusing on specific miRNA-target interactions within these pathways will be critical to fully elucidate the underlying mechanisms and better understand the functional consequences of these opposing regulatory effects.

      Minor Concerns:

      (1) Define SM before using it.

      We have defined the SM assay in the main text upon its first mention, where we describe the assay and its relevance to cognitive function (see page 11 of the revised manuscript).

      (2) Statistics have been run in Figure S2, but not presented. The text only states that the differences between groups are significant. Please add in.

      We have revised the legend of Figure S2 to include the specific statistical test used (students t-tests) and the corresponding p-values.

      (3) The switch from ASO1 to ASO2 between Figures 5 and 6 needs more discussion. Why were new ASOs generated when ASO1 worked?

      We thank the reviewer for their question regarding the transition from Emc10<sup>ASO1</sup> to Emc10<sup>ASO2</sup> between Figure 4 and Figures 5-6. Emc10<sup>ASO1</sup> served as our initial proof-of-concept ASO construct, successfully demonstrating the feasibility of inhibiting Emc10 mRNA expression and providing evidence for behavioral rescue in our mouse model. As outlined in the manuscript, Emc10<sup>ASO2</sup> targets a different region of the Emc10 transcript (intron 1, Figure 5A) compared to Emc10<sup>ASO1</sup> (intron 2, Figure 4A). This distinction provides an additional layer of validation for our targeting strategy and ensures specificity in modulating Emc10 expression. In addition, Emc10<sup>ASO1</sup> exhibited limited distribution in the brain, primarily targeting the hippocampus with weaker inhibition of Emc10 in other regions such as the cortex (Figure 4C, right panel). Emc10<sup>ASO2</sup> overcame this limitation and achieve broader brain distribution, as demonstrated by the qRT-PCR data in Figure 5C. Given that 22q11.2DS can affect multiple brain regions and cognitive domains beyond the hippocampus, achieving broader distribution of the ASO is critical for a more comprehensive assessment of therapeutic potential.

      (4) Page 3: Define "LoF"

      We have defined Loss-of-Function (LoF) in the main text where it is first mentioned in the Introduction, where we discuss the potential of using LoF mutations to devise therapeutic interventions (see page 3 of the revised manuscript).

      References

      Bailey and Crawley, Anxiety-Related Behaviors in Mice, In: Methods of Behavior Analysis in Neuroscience. 2nd edition. Boca Raton (FL): CRC Press/Taylor & Francis; Chapter 5, (2009).

      Bartel, MicroRNAs: target recognition and regulatory functions, Cell 136(2):215-33, (2009).

      Bartel, Metazoan MicroRNAs, Cell, 173(1):20-51, (2018).

      Chitwood et al., EMC Is Required to Initiate Accurate Membrane Protein Topogenesis, Cell 175, 1507-1519 e1516, (2018).

      Chitwood and Hegde, The Role of EMC during Membrane Protein Biogenesis, Trends Cell Biol. (5):371-384, (2019).

      Darras et al., Nusinersen in later-onset spinal muscular atrophy: Long-term results from the phase 1/2 studies, Neurology 92(21), (2019).

      Diamantopoulou et al., Loss-of-function mutation in Mirta22/Emc10 rescues specific schizophrenia-related phenotypes in a mouse model of the 22q11.2 deletion, Proc Natl Acad Sci U S A 114, E6127-E6136, (2017).

      Donegan et al., Coding of social novelty in the hippocampal CA2 region and its disruption and rescue in a 22q11.2 microdeletion mouse model, Nat Neurosci 23, 1365-1375, (2020).

      Finkel et al., Nusinersen versus Sham Control in Infantile-Onset Spinal Muscular Atrophy, N Engl J Med 377(18):1723-1732, (2017).

      Kordasiewicz et al., Sustained therapeutic reversal of Huntington's disease by transient repression of huntingtin synthesis, Neuron 74(6):1031-44, (2012).

      Oda et al., MicroRNA-34a-5p: A pivotal therapeutic target in gallbladder cancer, Mol Ther Oncol, 32(1):200765, (2024).

      Piskorowski et al., Age-Dependent Specific Changes in Area CA2 of the Hippocampus and Social Memory Deficit in a Mouse Model of the 22q11.2 Deletion Syndrome. Neuron 89, 163-176, (2016).

      Qi et al., Combined small-molecule inhibition accelerates the derivation of functional cortical neurons from human pluripotent stem cells. Nat Biotechnol 35, 154-163, (2017).

      Scoles et al., Antisense oligonucleotide therapy for spinocerebellar ataxia type 2, Nature 44(7650):362-366, (2017).

      Shao et al., A recurrent, homozygous EMC10 frameshift variant is associated with a syndrome of developmental delay with variable seizures and dysmorphic features, Genet Med 23, 1158-1162, (2021).

      Shurtleff et al., The ER membrane protein complex interacts cotranslationally to enable biogenesis of multipass membrane proteins, Elife 7, (2018).

      Soutschek et al., A human-specific microRNA controls the timing of excitatory synaptogenesis, bioRxiv, (2023).

      Stark et al., Altered brain microRNA biogenesis contributes to phenotypic deficits in a 22q11-deletion mouse model. Nat Genet 40, 751-760, (2008).

      Xu et al., Derepression of a neuronal inhibitor due to miRNA dysregulation in a schizophrenia-related microdeletion, Cell 152, 262-275, (2013).

      Yao et al., miR-200b targets GATA-4 during cell growth and differentiation, RNA Biol.10(4):465-8, (2013).

      Zhao et al., miR-92b-3p Regulates Cell Cycle and Apoptosis by Targeting CDKN1C, Thereby Affecting the Sensitivity of Colorectal Cancer Cells to Chemotherapeutic Drugs, Cancers 2;13(13):3323, (2021).

      Zhou et al., miR-92a is upregulated in cervical cancer and promotes cell proliferation and invasion by targeting FBXW7, Biochem Biophys Res Commun 458(1):63-9, (2015).

      Zhou et al., MicroRNA-143 acts as a tumor suppressor by targeting hexokinase 2 in human prostate cancer, Am J Cancer Res. 5(6):2056-6 (2015).

    1. eLife Assessment

      This study presents an important finding on the involvement of a Caspase 3-dependent pathway in the elimination of synapses for retinogeniculate circuit refinement and eye-specific territory segregation. This work fits well with the concept of "synaptosis" which has been proposed in the past. The evidence supporting the claims of the authors is convincing, demonstrating that caspase-3 activation is essential for microglial elimination of synapses during both brain development and neurodegeneration. The work will be of interest to investigators studying cell death pathways, neurodevelopment, and neurodegenerative disease.

    2. Reviewer #2 (Public review):

      This manuscript by Yu et al. demonstrates that activation of caspase-3 is essential for synapse elimination by microglia, but not by astrocytes. This study also reveals that caspase-3 activation-mediated synapse elimination is required for retinogeniculate circuit refinement and eye-specific territories segregation in dLGN in an activity-dependent manner. Inhibition of synaptic activity increases caspase-3 activation and microglial phagocytosis, while caspase-3 deficiency blocks microglia-mediated synapse elimination and circuit refinement in the dLGN. The authors further demonstrate that caspase-3 activation mediates synapse loss in AD, loss of caspase-3 prevented synapse loss in AD mice. Overall, this study reveals that caspase-3 activation is an important mechanism underlying the selectivity of microglia-mediated synapse elimination during brain development and in neurodegenerative diseases.

      A previous study (Gyorffy B. et al., PNSA 2018) has shown that caspase-3 signal correlates with C1q tagging of synapses (mostly using in vitro approaches), which suggests that caspase-3 would be an underlying mechanism of microglial selection of synapses for removal. The current study provides convincing in vivo evidence demonstrating that caspase-3 activation is essential for microglial elimination of synapses during both brain development and neurodegeneration.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors study the effects of synaptic activity on the process of eye-specific segregation, focusing on the role of caspase 3, classically associated with apoptosis. The method for synaptic silencing is elegant and requires intrauterine injection of a tetanus toxin light chain into the eye. The authors report that this silencing leads to increased caspase 3 in the contralateral eye (Figure 1) and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2. However, the quantifications showing increased caspase 3 in the silenced eye (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus. The authors also show that global caspase 3 deficiency impairs the process of eye-specific segregation and circuit refinement (Figures 3-4).

      The reviewer states: “this silencing leads to increased caspase 3 in the contralateral eye”. We observed increased caspase-3 activity, not protein levels, in the contralateral dLGN, not eye.

      The reviewer states: “and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2”. We do not believe that this statement is accurate, as we show that the punctate active caspase-3 signals overlap with the dendritic marker MAP2 (Figure S4A).

      The reviewer also states: “, the quantifications showing increased caspase 3 [activity] in the silenced [dLGN] (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus”. We do not believe that this statement is accurate. The apoptotic neurons we observed are relay neurons (confirmed by their morphology and positive staining of NeuN – Figure S4B-C) located in the dLGN (the dLGN is clearly labeled by expression of fluorescent proteins in RGCs, and only caspase-3 activity in the dLGN area is analyzed), not “cells” of unknown lineage (as suggested by the reviewer) in the general “thalamus” area (as suggested by the reviewer). If the dying cells were non-neuronal cells, that would indeed confound our quantification and conclusions, but that is not the case.

      We argue that whole-cell caspase-3 activation in dLGN relay neurons is a bona fide response to synaptic silencing by TeTxLC and therefore should be included in the quantification. We have two sets of controls: one is between the strongly inactivated dLGN and the weakly inactivated dLGN in the same TeTxLC-injected animal; and the second is between the dLGN of TeTxLC-injected animals and mock-injected animals. In both controls, only the dLGNs receiving strong synapse inactivation have more apoptotic dLGN relay neurons, demonstrating that these cells occur because of synapse inactivation. It is also unlikely that our perturbation is causing cell death through a non-synaptic mechanism. Since mock injections do not cause apoptosis in dLGN neurons, this phenomenon is not related to surgical damage. TeTxLC is injected into the eyes and only expressed in presynaptic RGCs, not in postsynaptic relay neurons, so this phenomenon is also unlikely to be caused by TeTxLC-related toxicity. Furthermore, if apoptosis of dLGN relay neurons is not related to synapse inactivation, then when TeTxLC is injected into both eyes, one would expect to see either the same amount or more apoptotic relay neurons, but we instead observed a reduction in dLGN neuron apoptosis, suggesting that synapse-related mechanisms are responsible. Considering the above, occasional whole-cell caspase-3 activation in relay neurons in TeTxLC-inactivated dLGN is causally linked to synapse inactivation and should be included in the quantification.

      We also revised the manuscript to better explain the possible mechanistic connection between localized caspase-3 activity and whole-cell caspase-3 activity. We propose that whole-cell caspase-3 activation occurs because of uncontrolled accumulation of localized caspase-3 activation. Please see line 127-140 and line 403-413 for details.

      Additionally, we would like to clarify that we are not claiming that synapse inactivation leads to only localized caspase-3 activation or only whole-cell caspase-3 activation, as is suggested by the editors and reviewers in the eLife assessment. We have clearly stated in the manuscript that both types of signals were observed. However, we reasoned that, because whole-cell caspase-3 activation in unperturbed dLGNs – which undergo normal synapse elimination – is infrequently observed, whole-cell caspase-3 activation may not be a significant driver of synapse elimination during normal development. In this revision, we included a new experiment to corroborate this hypothesis. If whole-cell caspase-3 activation in dLGN relay neurons is a prevalent phenomenon during normal development, such caspase-3 activity would lead to significant death of dLGN relay neurons during normal development. Consequently, if we block caspase-3 activation by deleting caspase-3, the number of relay neurons in the dLGN should increase. However, in support of our hypothesis, we observed comparable numbers of relay neurons in Casp3<sup>+/+</sup> and Casp3<sup>-/-</sup> mice. Please see Figure S7 for details.

      The authors also report that "synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia but not astrocytes" (abstract). They report that microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts. Based on this, the authors conclude that caspase 3 directs microglia to eliminate weaker synapses. However, a much simpler and critical experiment that the authors did not perform is to eliminate microglia and show that the caspase 3 dependent effects go away. Without this experiment, there is no reason to assume that microglia are directing synaptic elimination.

      The reviewer states: “microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts”. We are not sure what the reviewer means by “this preferentially occurs in silenced terminals”. Our results show that microglia preferentially engulf silenced terminals, and such preference is lost in caspase-3 deficient mice (Figure 6).

      We do not understand the experiment where the reviewer suggested to: “eliminate microglia and show that the caspase 3 dependent effects go away”. To quantify caspase-3 dependent engulfment of synaptic material by microglia or preferential engulfment of silenced terminals by microglia, microglia must be present in the tissue sample. If we eliminate microglia, neither of these measurements can be made. What could be measured if microglia are eliminated is the refinement of retinogeniculate pathway. This experiment would test whether microglia are required for caspase-3 dependent phenotypes. This is not a claim made in the manuscript. Instead, we claimed caspase-3 is required for microglia to engulf weak synapses, as supported by the evidence presented in Figure 6.

      We did not claim that “microglia are directing synaptic elimination”. Our claim is that synapse inactivation induces caspase-3 activity, and caspase-3 activation in turn leads to engulfment of weak synapses by microglia. Based on this model, it is the neuronal activity that fundamentally directs synapse elimination. Synapse engulfment by microglia is only a readout we used to measure the outcome of activity-dependent synapse elimination. We have revised all sections in the manuscript that are related to synapse engulfment by microglia to emphasize the logic of this model.

      We have also revised the abstract and title of the paper to better align it with our main claims, removed the reference to astrocytes, and clarified that microglia engulfment measurements are used as readouts of synapse elimination.

      Finally, the authors also report that caspase 3 deficiency alters synapse loss in 6-month-old female APP/PS1 mice, but this is not really related to the rest of the paper.

      We respectfully disagree that Figure 7 is not related to the rest of the paper. Many genes involved in postnatal synapse elimination, such as C1q and C3, have been implicated in neurodegeneration. It is therefore natural and important to ask whether the function of caspase-3 in regulating synaptic homeostasis extends to neurodegenerative diseases in adult animals. The answer to this question may have broad therapeutic impacts.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Yu et al. demonstrates that activation of caspase-3 is essential for synapse elimination by microglia, but not by astrocytes. This study also reveals that caspase 3 activation-mediated synapse elimination is required for retinogeniculate circuit refinement and eye-specific territories segregation in dLGN in an activity-dependent manner. Inhibition of synaptic activity increases caspase-3 activation and microglial phagocytosis, while caspase-3 deficiency blocks microglia-mediated synapse elimination and circuit refinement in the dLGN. The authors further demonstrate that caspase-3 activation mediates synapse loss in AD, loss of caspase-3 prevented synapse loss in AD mice. Overall, this study reveals that caspase-3 activation is an important mechanism underlying the selectivity of microglia-mediated synapse elimination during brain development and in neurodegenerative diseases.

      Strengths:

      A previous study (Gyorffy B. et al., PNSA 2018) has shown that caspase-3 signal correlates with C1q tagging of synapses (mostly using in vitro approaches), which suggests that caspase-3 would be an underlying mechanism of microglial selection of synapses for removal. The current study provides direct in vivo evidence demonstrating that caspase-3 activation is essential for microglial elimination of synapses in both brain development and neurodegeneration.

      The paper is well-organized and easy to read. The schematic drawings are helpful for understanding the experimental designs and purposes.

      Weaknesses:

      It seems that astrocytes contain large amounts of engulfed materials from ipsilateral and contralateral axon terminals (Figure S11B) and that caspase-3 deficiency also decreased the volume of engulfed materials by astrocytes (Figures S11C, D). So the possibility that astrocyte-mediated synapse elimination contributes to circuit refinement in dLGN cannot be excluded.

      We would like to clarify that we do not claim that astrocytes are unimportant for synapse elimination or circuit refinement. We acknowledge that the claim made in the original submitted manuscript that caspase-3 does not regulate synapse elimination by astrocytes lacks strong supporting evidence. We have removed this claim and revised the section related to synapse engulfment by astrocytes to provide a more rigorous interpretation of our data. We also removed the section in discussion regarding distinct substrate preferences of microglia and astrocytes.

      Does blocking single or dual inactivation of synapse activity (using TeTxLC) increase microglial or astrocytic engulfment of synaptic materials (of one or both sides) in dLGN?

      We assume that by “blocking single or dual inactivation of synapse activity”, the reviewer refers to inactivating retinogeniculate synapses from one or both eyes.

      We showed that inactivating retinogeniculate synapses from one eye (single inactivation) increases engulfment of inactive synapses by microglia (Figure 6). We did not measure synapse engulfment by microglia while inactivating retinogeniculate synapses from both eyes (dual inactivation). However, based on the total active caspase-3 signal (Figure 2) in the dual inactivation scenario, we do not expect to see an increase in engulfment of synaptic material by microglia.

      We did not measure astrocyte-mediated engulfment with single or dual inactivation, as we did not see a robust caspase-3 dependent phenotype in synapse engulfment by astrocytes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      (1) Figure 1 - It is not clear from this figure whether the authors are measuring caspase 3 in dendritic compartments or in dying relay neurons in the thalamus. The authors state that "either" whole cell death (1B) or smaller punctate signals (1F) were observed. When quantifying "photons" in Figure 1E, it appears most of the signal captured will be of dying relay neurons. What determined which signal was observed, and what is being quantified in Figure 1E? This also applies to the quantifications being reported in Figure 2.

      The quantification includes both types of signals – it is sum of all active caspase-3 signal within the dLGN boundary. We note that there is a significant amount of punctate signal in the TeTxLC-inactivated dLGN. Unfortunately, due to file compression, these signals are not clearly visible in the submitted manuscript file. We have provided high resolution figures in this revision.

      As argued above in the response to the public review, apoptotic relay neurons in TeTxLC-inactivated dLGN (not the general thalamus area) occur as a direct consequence of synapse inactivation. Therefore, active caspase-3 signals in these relay neurons should be included in the quantification.

      We believe it is the extent of synapse inactivation (i.e., the number of synapses that are inactivated) that determines whether dLGN relay neuron apoptosis occurs or not. Such apoptosis is expected considering the nature of the apoptosis signaling cascade. In the intrinsic apoptosis pathway, release of cytochrome-c from mitochondria induces cleavage of the initiator caspase, caspase-9, and caspase-9 in turn cleaves the executioner caspases, caspase-3/7, which causes apoptosis. Caspase-3 can cleave upstream factors in the apoptosis pathway, leading to explosive amplification of caspase-3 activity (McComb et al., DOI: 10.1126/sciadv.aau9433). When a relay neuron receives a few inactivated synapses, caspase-3 activation in the postsynaptic dendrite can remain local (as we observed in Figure 1), constrained by mechanisms such as proteasomal degradation of cleaved caspase-3 (Erturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014). However, when a relay neuron receives many inactivated synapses, the cumulative caspase-3 activity induced in the dendrite can overwhelm negative regulation and lead to significantly higher levels of caspase-3 activity in entire dendrites (Figure S4B) through positive feedback amplification, eventually leading to caspase-3 activation in entire relay neurons. Please see line 127-140 and line 403-413 for our discussion in the main text.

      (2) Figure 5 - Figures 5c-d and Fig 6 are confounded by pseudoreplication, whereby performing statistics on 50-60 microglia inflates statistical significance. Could the authors show all these data per mouse?

      If we understand the reviewer correctly, the reviewer is suggesting that reporting measurements from multiple microglia in one animal constitutes pseudo-replication. This is correct in a strict sense, as microglia in the same animal are more likely to be similar than microglia from different animals. In the revised version, we have plotted the data by animal in Figure S11 and S13. The observations remain valid. However, we would like to point out that averaging measurements from all microglia in each animal and report by mouse is very conservative, as measurements from microglia in the same animal still vary greatly due to cell-to-cell differences.

      (3) Although the authors are not the only ones to use this strategy, it is worth noting that performing all microglial experiments in Cx3cr1 heterozygotes could lead to alterations in microglial function that may not be reflective of their homeostatic roles.

      We acknowledge that Cx3cr1 heterozygosity could cause alterations in microglial physiology.

      While Cx3cr1 heterozygosity may impact microglia physiology, we note that the engulfment assay in Figure 5 is comparing microglia in Cx3cr1<sup>+/-</sup>; Casp3<sup>+/-</sup> and Cx3cr1<sup>+/-</sup>; Casp3<sup>-/-</sup> animals. Therefore, the impact of Cx3cr1 heterozygosity is controlled for in our experiment, and the observed difference in engulfed synaptic material in microglia is an effect specific to caspase-3 deficiency. However, we acknowledge that this difference could be quantitatively affected by Cx3cr1 heterozygosity.

      It is important to note that we did not perform all microglia engulfment analyses using Cx3cr1<sup>+/-</sup> mice. We have edited the manuscript to make this more clear. In the activity-dependent microglia engulfment analysis performed in Figure 6, we used Casp3<sup>+/+</sup> and Casp3<sup>-/-</sup> animals and detected microglia with anti-Iba1 immunostaining. Therefore, the impact of Cx3cr1 heterozygosity is not a problem for this experiment.

      Minor:

      (1) Figures are presented out of order, which makes the manuscript difficult to follow.

      We have revised text regarding the segregation analysis to align with the order of figures.

      (2) Figure S3 is very confusing- the terms "left" and "right" are used in three or four partly overlapping contexts (which eye, which injection, which panel or subpanel of the figure is being referred to). Would this not be more appropriately analyzed with a repeated measures ANOVA (multiple comparisons not necessary) rather than multiple separate T-tests?

      We have revised Figure S3 and S5 with better annotation and legends.

      Yes, it is possible to use repeated measure two-way ANOVA. The analysis reports significant effect from genotypes, with a dF of 1, SoS and MoS of 0.0001081, F(1,13) = 7.595, and p = 0.0164. We used multiple separate t-tests because we wanted to show how genotype effects change with increasing thresholds, whereas two-way ANOVA only provides one overall p-value.

      (3) Could the authors clarify why the percentage overlap (in the controls) is so different between Figure 3C and Figure S3C, and why different thresholds are applied?

      This difference is primary due to difference in age. Figure 3 and Figure S5 are acquired at age of P10, while Figure S3 is acquired at P8. While the segregation process is largely complete by P8, the segregation continues from P8 to P10. Therefore, overlap measured at P10 will be lower than that measured at P8. If we compare overlap at the same threshold (e.g., 10%) and at the same age in Figure 3 and S5, the overlap is very similar.

      The choice of threshold is related to the methods of labeling. In Figure 3, RGC terminals are labeled with AlexaFlour conjugated cholera toxin subunit-beta (CTB). In Figure S3 and S5, RGC axons are labeled by expression of fluorescent proteins. Labeling with CTB only labels membrane surfaces but yields stronger and slightly different signals at fine scales than labeling with fluorescent protein which are cell fillers. For Figure S3 and S5 (which use fluorescent protein labeling), higher thresholds such as those used in Figure 3 (which use CTB labeling) can be applied and the same trend still holds, but the data will be noisier. Regardless of the small difference in thresholds used, the important observation is that the defects in TeTxLC-injected or caspase-3 deficient animals are clear across multiple thresholds.

      (4) Many describe the eye-specific segregation process as being complete "between P8-10". Other studies have quantified ESS at P10 (Stevens 2007). The authors state they did all quantifications at P8 (l. 82) and refer to Figure 3, but Figure 3 shows images from P10, whereas Figure S3 shows data from P8.

      We did not say we performed all quantification at P8. In line 85, we said “To validate the efficacy of our synapse inactivation method, we injected AAV-hSyn-TeTxLC into the right eye of wildtype E15 embryos and analyzed the segregation of eye-specific territories at postnatal day 8 (P8), when the segregation process is largely complete”. The age of postnatal day 8 in this context is specifically referring to the experiment shown in Figure S3. For the segregation analysis in Figure 3, we specifically stated that the experiment was conducted at P10 (line 277).

      Although the experiment in Figure S3 is conducted at P8, and Figure S5 and Figure 3 show results at P10, each dataset always included appropriate age-matched controls.  P8 is generally considered an age where segregation is mostly complete and sufficient for us to assess the potency of TeTxLC-delivered AAV on eye segregation.  We don’t think performing the experiment shown in Figure S3 at P8 impacts the interpretation of the data.

      (5) Is Figure 6 also using Cx3cr1 GFP to label microglia? This is not clarified.

      We apologize for this oversight. In Figure 6 microglia are labeled by anti-Iba1 immunostaining. We have clarified this in figure legends and text.

      Reviewer #2 (Recommendations for the Authors):

      (1) The authors quantified the caspase-3 activity using immunostaining and confocal microscopy (Figures 1B-E). They may need to verify the result (increased level of activated caspase-3 upon synapse inactivation) using alternative methods, such as western blotting.

      Both western blot and immunostaining are based on antibody-antigen interaction. These two methods are not likely sufficiently independent. Additionally, to perform a western blot, we would need to surgically collect the TeTxLC-inactivated dLGN to avoid sample contamination from other brain regions. Such collection at the age we are interested in (P5) is very challenging. We have tested the anti-cleaved caspase-3 antibody using caspase-3 deficient mice and we can confirm it is a highly specific antibody that doesn’t generate signal in the caspase-3 deficient tissue samples.

      (2) Does caspase-3 deficiency alter the density of microglia or astrocytes in dLGN?

      No. Neither the density of microglia nor astrocytes changed with caspase-3 deficiency. In the case of microglia, we find that the mean density of microglia per unit area of dLGN is virtually the same in wild type and caspase-3 deficient mice (two-tailed t test P = 0.8556, 6 wild type and 5 Casp3<sup>-/-</sup> mice). Some overviews showing microglia in dLGNs of wildtype and caspase-3 deficient mice can be found in Figure S10.  Similarly for astrocytes, we did not observe overt changes in astrocytes dLGN density linked to caspase-3 deficiency.

      (3) During dLGN eye-specific segregation in normal developing animals, did the authors observe different levels of activated caspase-3 in different regions (territories)?

      For normal developing animals, the activated caspase-3 signal is generally sparse, and it is difficult to distinguish whether the signal is related to synapse elimination. For animals receiving TeTxLC-injection, we did notice that in the dLGN contralateral to the injection, where most inactivated synapses are located, the punctate caspase-3 signal tends to concentrate on the ventral-medial side of the dLGN (Figure 1B), which is the region preferentially innervated by the contralateral eye.

      (4) Recording of NMDAR-mediated synaptic currents may not be necessary for demonstrating that caspase 3 is essential for dLGN circuit refinement. In addition, the PPR may not necessarily reflect the number of innervations that a dLGN neuron receives. Instead, showing the changes in the frequency of mEPSCs (or synapse/spine density) may be more supportive.

      Thank you for the comment. We have performed the suggested mEPSC measurements and reported the results in revised Figure 4D-F.

      (5) Why is caspase 3 activation enhanced (compared to control) only at 4 months of age, when A-beta deposition has not formed yet, but not at later time points in AD mice (Figure S17)?

      A prevailing hypothesis in the field is that the form of A-beta that is most neurotoxic is the soluble oligomeric form, not the fibril form that leads to plaque deposition. As the oligomeric form appears before plaque deposition, the enhanced caspase-3 activation we observed at 4-month may reflect an increase in oligomeric A-beta, which occurs before any visible A-beta plaque formation.

      (6) The manuscript can be made more concise, and the figures more organized.

      We removed superfluous details and corrected text-figure mismatches in the revised manuscript to improve readability.

    1. eLife Assessment

      This valuable study reports on the characteristics of premotor cortical population activity during the execution and observation of a moderately complex reaching and grasping task. By using new variants of well-established techniques to analyse neural population activity, the authors provide solid evidence that while the geometry of neural population activity changes between execution and observation, their dynamics are largely preserved. Although these findings are novel and robust, pending additional controls and analyses, the authors should further clarify the functional implications of their findings.

    2. Reviewer #2 (Public review):

      The authors investigated the similarity (or lack thereof) of neural dynamics while monkeys reached to and manipulated one of 4 objects in each trial, compared to observing similar movements performed by experimenters. They focused on mirror neurons (MNs) and rather convincingly showed that MNs dynamics are dissimilar during executing vs. observing actions. The manuscript has improved quite significantly compared to the previous version and I congratulate the authors for that. However, there are still a few points I would like to raise that I think will improve the manuscript scientifically and make it more pleasant to read.

      - I appreciate the nicely compiled literature review which provides the context for the manuscript.<br /> - Message: The takeaway message of the paper is inconsistent and changes throughout the paper. To me, the main takeaway is that observation and execution subspaces progress during the trial (Fig 4), and that they are distinct processes and rather dissimilar, as stated in #440-441, #634-635, etc. But the title of the paper implies the opposite. Some of the interpretations of the results (e.g., Fig 8) also imply similarity of dynamics.<br /> - Readability: I have many issues with the readability/organisation of the paper. Unfortunately, I still find the quality of data presentation low. Below I list a few points:<br /> (1) In 5 sessions out of 9, there are fewer than 20 neurons categorised as AE. This means this population is under-sampled in the data which makes applying any neural population techniques questionable. Moreover, the relevance of the AE analysis is also sometimes unclear: In Fig 4, the AE-related panels are just referred to once in the paper. Yet AE results are presented right next to the main results throughout the paper.<br /> (2) Figures are low resolution and pixelated. There are some faded horizontal and vertical lines in Fig1B that are barely visible. Moreover, it may be my personal preference, but I think Fig1 is more confusing than helpful. Although panel A shows some planes rotating, indicating time-varying dynamics, I couldn't understand what more panel B is trying to convey. The arrow of time is counterclockwise, but the planes progress clockwise (i > ii > iii). Similarly, panel C just seems to show some points being projected to orthogonal subspaces (even though later in the paper we'll see that observation and execution subspaces are not orthogonal), and the CCA subspace illustrated in the same high-d space, which mathematically may be inaccurate, as CCA projects the data to a new space.<br /> In Fig 2A, the objects are too small and pixelated as well. I suggest an overhaul of the figures to make the paper more accessible.<br /> (3) Clarity of the text: The manuscript text could be more concise, to the point, avoiding repetitions, self-consistent, and simply readable. To name a few issues: Single letter acronyms were used to refer to trial epochs (I/G/M/H). M alone has been re-defined 13 different times in the text as in: ...Movement (M)..., excluding every related figure. The acronym (I) refers to the instruction epoch, the high-d space in Fig 1, and panel I of some figures. The acronym MN for Mirror Neurons was defined 4 separate times in the text yet spelled out as Mirror Neuron more than 2 dozen times. CD is defined in the caption of Fig 3 and never used, despite condition-dependent being a common term in the text. Many sentences, e.g., "In contrast, throughout..." in #265-#269, and "To summarize,..." in #270-#275, are too long with difficult wording. To get the point from these sentences, I had to read them many times, and go back and forth between them and the figure. Rewriting such sentences makes the manuscript much more accessible.<br /> - Figure 3: It appears that the condition independent signal has been calculated by subtracting the average of the 4 neural trajectories in Fig 3A, corresponding to different objects. Whereas #133 suggests that it should be calculated by subtracting the average firing rate of different conditions. Assuming I got the methods right, dynamics being "knotted" (#234) after removing the condition independent signal could be because they are similar, so subtracting the condition independent signal leaves us with the noise component. This matters for the manuscript especially since this is the reason for performing the more sensitive instantaneous subspaces.<br /> - Decoding results: I appreciate that the authors improved the decoding results in this version of the manuscript. Now it is much more interesting. However oddly, it appears that only data from 1 monkey is shown. #370 says the results from the other 2 are similar. The decoding data from every monkey must be shown. If the results are similar, they must be at least in Supplements. Currently, only 1 session (out of 3) in the Observation condition seems to decode the object type. This effect, if consistent across animals and session, is very interesting on its own and challenges other claims in the paper.<br /> - Figure8: I reiterate the issue #7 in my previous review. I appreciate the authors clearing some methods, but my concern persists. As per line #839, spiking activity has been smoothed with a 50ms kernel. Thus, unless trial data is concatenated, I suspect the 100ms window used for this analysis is too short (small sample size), thus the correlation values (CCs) might be spurious. References cited in this section use a smaller smoothing kernel (30ms) and a much longer window (~450ms).<br /> Moreover, I don't know why the authors chose to show correlation values in 3D space! Values of Fig8C-red are impossible to know. Furthermore, the manuscript insists on CC values of the Hold period being high, which is probably correct. But I wonder why the focus on the Hold period? I think the most relevant epoch for analysing the MNs is the Movement where the actual action happens. Interestingly, in the movement epoch, the CC values are visibly low. The reason why Hold results are more important and why the CCs in Movement are so low should be clarified in the text. Especially, statements like that in #661 seem particularly unjustified.

    3. Reviewer #3 (Public review):

      In their study, Zhao et al. investigated the population activity of mirror neurons (MNs) in the premotor cortex of monkeys either executing or observing a task consisting of reaching to, grasping, and manipulating various objects. The authors proposed an innovative method for analyzing the population activity of MNs during both execution and observation trials. This method enabled to isolate the condition dependent variance in neural data and to study its temporal evolution over the course of single trials. The method proposed by the authors consists of building a time series of "instantaneous" subspaces with single time step resolution, rather than a single subspace spanning the entire task duration. As these subspaces are computed on an instant time basis, projecting neural activity from a given task time into them results in latent trajectories that capture condition-dependent variance while minimizing the condition-independent one. Authors then analyzed the time evolution of these instantaneous subspaces and revealed that a progressive shift is present in subspaces of both execution and observation trials, with slower shifts during the grasping and manipulating phases compared to the initial preparation phase. Finally, they compared the instantaneous subspaces between execution and observation trials and observed that neural population activity did not traverse the same subspaces in these two conditions. However, they showed that these distinct neural representations can be aligned with Canonical Correlation Analysis, indicating dynamic similarities of neural data when executing and observing the task. The authors speculated that such similarities might facilitate the nervous system's ability to recognize actions performed by oneself or another individual.

      Unlike other areas of the brain, the analysis of neural population dynamics of premotor cortex MNs is not well established. Furthermore, analyzing population activity recorded during non-trivial motor actions, distinct from the commonly used reaching tasks, serves as a valuable contribution to computational neuroscience. This study holds particular significance as it bridges both domains, shedding light on the temporal evolution of the shift in neural states when executing and observing actions. The results are moderately robust, and the proposed analytical method could potentially be used in other neuroscience contexts.

    4. Reviewer #4 (Public review):

      Summary:

      In this study, the authors explore the neural dynamics of mirror neurons in the premotor cortex, focusing on the relationship between neural activity during action execution and observation. The study presents a rich dataset from three monkeys, with recordings from two regions per monkey. The authors use a method to analyze instantaneous neural subspaces and track their temporal evolution. Consistent with prior literature, they report that execution and observation subspaces remain largely distinct throughout the trial. However, after applying canonical correlation analysis, they observe a notable alignment between execution and observation activities, suggesting the presence of shared neural codes. The study is well-designed, and the analyses are thoroughly documented, occasionally overly so in the main text. While most findings are compelling, I find the conclusions drawn from Figure 8 less convincing. Specifically, I am skeptical about the application of CCA in this context and the subsequent interpretations regarding execution-observation similarity, which is a central claim of the manuscript.

      • The authors cite Safaie et al. 2023 as a precedent for applying CCA to align neural population dynamics. However, in that study, CCA was used to align neural dynamics across different animals, a justifiable approach given that neural trajectories exist in separate neural state spaces for each animal. Here, CCA is applied to align execution and observation activities within the same neural state space of the same MNs. I find this application of CCA less well-justified, as it may overestimate execution-observation similarity.<br /> • The control conditions presented in Figures 8C and 8D are somewhat reassuring, as they show that the similarity introduced by CCA is not universally high. However, these controls appear to be limited to the Hold epoch. It remains unclear whether the same holds true for the Go and Movement epochs.<br /> • In Figure 5, the authors display low-dimensional representations of four objects across task epochs during execution (A) and observation (B). The diagonals of the matrices reveal clear differences between execution and observation configurations across all four epochs. The authors suggest using CCA to align these configurations; however, this alignment seems to require time-specific application of CCA for each epoch (as demonstrated in Figure 8 for the Hold epoch). The need for time-specific adjustments likely depends on the fact that execution and observation subspaces are continuously shifting over time (as authors show in Figure 4), but this approach appears to be a strained attempt to demonstrate similarity between execution and observation codes.<br /> • The authors themselves offer an alternative hypothesis (line 730): that "PM MN population activity during action observation, rather than representing movements made by another individual similar to one's own movements, instead may represent different movements one might execute oneself in response to those made by another individual". This interpretation appears more congruent with the data presented.<br /> • In the end, I am left with a sense of ambiguity: which analysis should be considered more reliable, the negligible correspondence between execution and observation activity depicted in Figure 7, or the considerable similarity shown in Figure 8? The authors should address this apparent contradiction and provide a clearer discussion to reconcile these findings.

    1. eLife Assessment

      This important study provides solid evidence that glucosylceramide synthase (GlcT), a rate-limiting enzyme for glycosphingolipid (GSL) production, plays a role in the differentiation of intestinal cells. Mutations in GlcT compromise Notch signaling in the Drosophila intestinal stem cell lineage resulting in the formation of enteroendocrine tumors, and preliminary data suggests that a homolog of glucosylceramide synthase also influences Notch signaling in the mammalian intestine. While the outstanding strengths of the initial genetic and downstream pathway analyses are noted, there are weaknesses in the data regarding the potential role of this pathway in Delta trafficking. Nevertheless, this study opens the way for future mechanistic studies addressing how specific lipids modulate Notch signalling activity.

    2. Reviewer #1 (Public review):

      Summary:

      From a forward genetic mosaic mutant screen using EMS, the authors identify mutations in glucosylceramide synthase (GlcT), a rate-limiting enzyme for glycosphingolipid (GSL) production, that result in EE tumors. Multiple genetic experiments strongly support the model that the mutant phenotype caused by GlcT loss is due to by failure of conversion of ceramide into glucosylceramide. Further genetic evidence suggests that Notch signaling is comprised in the ISC lineage and may affect the endocytosis of Delta. Loss of GlcT does not affect wing development or oogenesis, suggesting tissue-specific roles for GlcT. Finally, an increase in goblet cells in UGCG knockout mice, not previously reported, suggests a conserved role for GlcT in Notch signaling in intestinal cell lineage specification.

      Strengths:

      Overall, this is a well-written paper with multiple well-designed and executed genetic experiments that support a role for GlcT in Notch signaling in the fly and mammalian intestine. I do, however, have a few comments below.

      Weaknesses:

      (1) The authors bring up the intriguing idea that GlcT could be a way to link diet to cell fate choice. Unfortunately, there are no experiments to test this hypothesis.

      (2) Why do the authors think that UCCG knockout results in goblet cell excess and not in the other secretory cell types?

      (3) The authors should cite other EMS mutagenesis screens done in the fly intestine.

      (4) The absence of a phenotype using NRE-Gal4 is not convincing. This is because the delay in its expression could be after the requirement for the affected gene in the process being studied. In other words, sufficient knockdown of GlcT by RNA would not be achieved until after the relevant signaling between the EB and the ISC occurred. Dl-Gal4 is problematic as an ISC driver because Dl is expressed in the EEP.

      (5) The difference in Rab5 between control and GlcT-IR was not that significant. Furthermore, any changes could be secondary to increases in proliferation.

    3. Reviewer #2 (Public review):

      Summary:

      This study genetically identifies two key enzymes involved in the biosynthesis of glycosphingolipids, GlcT and Egh, which act as tumor suppressors in the adult fly gut. Detailed genetic analysis indicates that a deficiency in Mactosyl-ceramide (Mac-Cer) is causing tumor formation. Analysis of a Notch transcriptional reporter further indicates that the lack of Mac-Ser is associated with reduced Notch activity in the gut, but not in other tissues.

      Addressing how a change in the lipid composition of the membranes might lead to defective Notch receptor activation, the authors studied the endocytic trafficking of Delta and claimed that internalized Delta appeared to accumulate faster into endosomes in the absence of Mac-Cer. Further analysis of Delta steady-state accumulation in fixed samples suggested a delay in the endosomal trafficking of Delta from Rab5+ to Rab7+ endosomes, which was interpreted to suggest that the inefficient, or delayed, recycling of Delta might cause a loss in Notch receptor activation.

      Finally, the histological analysis of mouse guts following the conditional knock-out of the GlcT gene suggested that Mac-Cer might also be important for proper Notch signaling activity in that context.

      Strengths:

      The genetic analysis is of high quality. The finding that a Mac-Cer deficiency results in reduced Notch activity in the fly gut is important and fully convincing.

      The mouse data, although preliminary, raised the possibility that the role of this specific lipid may be conserved across species.

      Weaknesses:

      This study is not, however, without caveats and several specific conclusions are not fully convincing.

      First, the conclusion that GlcT is specifically required in Intestinal Stem Cells (ISCs) is not fully convincing for technical reasons: NRE-Gal4 may be less active in GlcT mutant cells, and the knock-down of GlcT using Dl-Gal4ts may not be restricted to ISCs given the perdurance of Gal4 and of its downstream RNAi.

      Second, the results from the antibody uptake assays are not clear.: i) the levels of internalized Delta were not quantified in these experiments; ii) additionally, live guts were incubated with anti-Delta for 3hr. This long period of incubation indicated that the observed results may not necessarily reflect the dynamics of endocytosis of antibody-bound Delta, but might also inform about the distribution of intracellular Delta following the internalization of unbound anti-Delta. It would thus be interesting to examine the level of internalized Delta in experiments with shorter incubation time. Overall, the proposed working model needs to be solidified as important questions remain open, including: is the endo-lysosomal system, i.e. steady-state distribution of endo-lysosomal markers, affected by the Mac-Cer deficiency? Is the trafficking of Notch also affected by the Mac-Cer deficiency? is the rate of Delta endocytosis also affected by the Mac-Cer deficiency? are the levels of cell-surface Delta reduced upon the loss of Mac-Cer?

      Third, while the mouse results are potentially interesting, they seem to be relatively preliminary, and future studies are needed to test whether the level of Notch receptor activation is reduced in this model.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Tang et al report the discovery of a Glycoslyceramide synthase gene, GlcT, which they found in a genetic screen for mutations that generate tumorous growth of stem cells in the gut of Drosophila. The screen was expertly done using a classic mutagenesis/mosaic method. Their initial characterization of the GlcT alleles, which generate endocrine tumors much like mutations in the Notch signaling pathway, is also very nice. Tang et al checked other enzymes in the glycosylceramide pathway and found that the loss of one gene just downstream of GlcT (Egh) gives similar phenotypes to GlcT, whereas three genes further downstream do not replicate the phenotype. Remarkably, dietary supplementation with a predicted GlcT/Egh product, Lactosyl-ceramide, was able to substantially rescue the GlcT mutant phenotype. Based on the phenotypic similarity of the GlcT and Notch phenotypes, the authors show that activated Notch is epistatic to GlcT mutations, suppressing the endocrine tumor phenotype and that GlcT mutant clones have reduced Notch signaling activity. Up to this point, the results are all clear, interesting, and significant. Tang et al then go on to investigate how GlcT mutations might affect Notch signaling, and present results suggesting that GlcT mutation might impair the normal endocytic trafficking of Delta, the Notch ligand. These results (Fig X-XX), unfortunately, are less than convincing; either more conclusive data should be brought to support the Delta trafficking model, or the authors should limit their conclusions regarding how GlcT loss impairs Notch signaling. Given the results shown, it's clear that GlcT affects EE cell differentiation, but whether this is via directly altering Dl/N signaling is not so clear, and other mechanisms could be involved. Overall the paper is an interesting, novel study, but it lacks somewhat in providing mechanistic insight. With conscientious revisions, this could be addressed. We list below specific points that Tang et al should consider as they revise their paper.

      Strengths:

      The genetic screen is excellent.

      The basic characterization of GlcT phenotypes is excellent, as is the downstream pathway analysis.

      Weaknesses:

      (1) Lines 147-149, Figure 2E: here, the study would benefit from quantitations of the effects of loss of brn, B4GalNAcTA, and a4GT1, even though they appear negative.

      (2) In Figure 3, it would be useful to quantify the effects of LacCer on proliferation. The suppression result is very nice, but only effects on Pros+ cell numbers are shown.

      (3) In Figure 4A/B we see less NRE-LacZ in GlcT mutant clones. Are the data points in Figure 4B per cell or per clone? Please note. Also, there are clearly a few NRE-LacZ+ cells in the mutant clone. How does this happen if GlcT is required for Dl/N signaling?

      (4) Lines 222-225, Figure 5AB: The authors use the NRE-Gal4ts driver to show that GlcT depletion in EBs has no effect. However, this driver is not activated until well into the process of EB commitment, and RNAi's take several days to work, and so the author's conclusion is "specifically required in ISCs" and not at all in EBs may be erroneous.

      (5) Figure 5C-F: These results relating to Delta endocytosis are not convincing. The data in Fig 5C are not clear and not quantitated, and the data in Figure 5F are so widely scattered that it seems these co-localizations are difficult to measure. The authors should either remove these data, improve them, or soften the conclusions taken from them. Moreover, it is unclear how the experiments tracing Delta internalization (Fig 5C) could actually work. This is because for this method to work, the anti-Dl antibody would have to pass through the visceral muscle before binding Dl on the ISC cell surface. To my knowledge, antibody transcytosis is not a common phenomenon.

      (6) It is unclear whether MacCer regulates Dl-Notch signaling by modifying Dl directly or by influencing the general endocytic recycling pathway. The authors say they observe increased Dl accumulation in Rab5+ early endosomes but not in Rab7+ late endosomes upon GlcT depletion, suggesting that the recycling endosome pathway, which retrieves Dl back to the cell surface, may be impaired by GlcT loss. To test this, the authors could examine whether recycling endosomes (marked by Rab4 and Rab11) are disrupted in GlcT mutants. Rab11 has been shown to be essential for recycling endosome function in fly ISCs.

      (7) It remains unclear whether Dl undergoes post-translational modification by MacCer in the fly gut. At a minimum, the authors should provide biochemical evidence (e.g., Western blot) to determine whether GlcT depletion alters the protein size of Dl.

      (8) It is unfortunate that GlcT doesn't affect Notch signaling in other organs on the fly. This brings into question the Delta trafficking model and the authors should note this. Also, the clonal marker in Figure 6C is not clear.

      (9) The authors state that loss of UGCG in the mouse small intestine results in a reduced ISC count. However, in Supplementary Figure C3, Ki67, a marker of ISC proliferation, is significantly increased in UGCG-CKO mice. This contradiction should be clarified. The authors might repeat this experiment using an alternative ISC marker, such as Lgr5.

    5. Author response:

      We would like to express our gratitude to all three reviewers for their time and valuable feedback on the manuscript. Below, we provide our point-by-point responses to their comments. Additionally, we summarize here the experiments we plan to conduct in accordance with the reviewers' suggestions:

      Revision plan 1. To include live imaging of Dl/Notch trafficking in normal and GlcT mutant ISCs.

      We agree that the effect of GlcT mutation on Dl trafficking was not convincingly demonstrated in our previous work. Although we attempted live imaging of the intestine using GFP tagged at the C-terminal of Dl, the fluorescent signal was regrettably too weak for reliable capture. In this revision, we will optimize the imaging conditions to determine if this issue can be resolved. Alternatively, we will transiently express GFP/RFP-tagged Dl in both normal and mutant ISCs to investigate the trafficking dynamics through live imaging.

      Revision plan 2. To update and improve the presentation of the data regarding the features of early/late/recycling endosomes in GlcT mutant ISCs.

      Our analysis of Rab5 and Rab7 endosomes in both normal and GlcT mutant ISCs revealed that Dl tends to accumulate in Rab5 endosomes in GlcT mutant ISCs. To strengthen our findings, we will include additional quantitative data and conduct further analysis on recycling endosomes labeled with Rab11-GFP. We acknowledge that this portion of the data is not entirely convincing, and in accordance with the reviewers' suggestions, we will revise our conclusions to present a more tempered interpretation.

      Revision plan 3. To include western blot analysis of Dl in normal and GlcT mutant ISCs.

      While we propose that MacCer may function as a component of lipid rafts, facilitating the anchorage of Dl on the membrane and its proper endocytosis, it is also possible that it acts as a substrate for the modification of Dl, which is essential for its functionality. To investigate this further, we will conduct Western blot analysis to determine whether the depletion of GlcT alters the protein size of Dl.

      Please find our detailed point-by-point responses below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      From a forward genetic mosaic mutant screen using EMS, the authors identify mutations in glucosylceramide synthase (GlcT), a rate-limiting enzyme for glycosphingolipid (GSL) production, that result in EE tumors. Multiple genetic experiments strongly support the model that the mutant phenotype caused by GlcT loss is due to by failure of conversion of ceramide into glucosylceramide. Further genetic evidence suggests that Notch signaling is comprised in the ISC lineage and may affect the endocytosis of Delta. Loss of GlcT does not affect wing development or oogenesis, suggesting tissue-specific roles for GlcT. Finally, an increase in goblet cells in UGCG knockout mice, not previously reported, suggests a conserved role for GlcT in Notch signaling in intestinal cell lineage specification.

      Strengths:

      Overall, this is a well-written paper with multiple well-designed and executed genetic experiments that support a role for GlcT in Notch signaling in the fly and mammalian intestine. I do, however, have a few comments below.

      Weaknesses:

      (1) The authors bring up the intriguing idea that GlcT could be a way to link diet to cell fate choice. Unfortunately, there are no experiments to test this hypothesis.

      We indeed attempted to establish an assay to investigate the impact of various diets (such as high-fat, high-sugar, or high-protein diets) on the fate choice of ISCs. Subsequently, we intended to examine the potential involvement of GlcT in this process. However, we observed that the number or percentage of EEs varies significantly among individuals, even among flies with identical phenotypes subjected to the same nutritional regimen. We suspect that the proliferative status of ISCs and the turnover rate of EEs may significantly influence the number of EEs present in the intestinal epithelium, complicating the interpretation of our results. Consequently, we are unable to conduct this experiment at this time. The hypothesis suggesting that GlcT may link diet to cell fate choice remains an avenue for future experimental exploration.

      (2) Why do the authors think that UCCG knockout results in goblet cell excess and not in the other secretory cell types?

      This is indeed an interesting point. In the mouse intestine, it is well-documented that the knockout of Notch receptors or Delta-like ligands results in a classic phenotype characterized by goblet cell hyperplasia, with little impact on the other secretory cell types. This finding aligns very well with our experimental results, as we noted that the numbers of Paneth cells and enteroendocrine cells appear to be largely normal in UGCG knockout mice. By contrast, increases in other secretory cell types are typically observed under conditions of pharmacological inhibition of the Notch pathway.

      (3) The authors should cite other EMS mutagenesis screens done in the fly intestine.

      To our knowledge, the EMS screen on 2L chromosome conducted in Allison Bardin’s lab is the only one prior to this work, which leads to two publications (Perdigoto et al., 2011; Gervais, et al., 2019). We will include citations for both papers in the revised manuscript.

      (4) The absence of a phenotype using NRE-Gal4 is not convincing. This is because the delay in its expression could be after the requirement for the affected gene in the process being studied. In other words, sufficient knockdown of GlcT by RNA would not be achieved until after the relevant signaling between the EB and the ISC occurred. Dl-Gal4 is problematic as an ISC driver because Dl is expressed in the EEP.

      We agree that the lack of an observable phenotype using NRE-Gal4 might be attributed to a delay in its expression, which could result in missing the critical window necessary for effective GlcT knockdown. Consequently, we cannot rule out the possibility that GlcT may also play a role in early EBs or EEPs. We will revise our manuscript to present a more cautious conclusion on this issue.

      (5) The difference in Rab5 between control and GlcT-IR was not that significant. Furthermore, any changes could be secondary to increases in proliferation.

      We agree that it is possible that the observed increase in proliferation could influence the number of Rab5+ endosomes, and we will temper our conclusions on this aspect accordingly. However, it is important to note that, although the difference in Rab5+ endosomes between the control and GlcT-IR conditions appeared mild, it was statistically significant and reproducible. As we have indicated earlier, we plan to further analyze Rab11+ endosomes, as this additional analysis may provide further support for our previous conclusions.

      Reviewer #2 (Public review):

      Summary:

      This study genetically identifies two key enzymes involved in the biosynthesis of glycosphingolipids, GlcT and Egh, which act as tumor suppressors in the adult fly gut. Detailed genetic analysis indicates that a deficiency in Mactosyl-ceramide (Mac-Cer) is causing tumor formation. Analysis of a Notch transcriptional reporter further indicates that the lack of Mac-Ser is associated with reduced Notch activity in the gut, but not in other tissues.

      Addressing how a change in the lipid composition of the membranes might lead to defective Notch receptor activation, the authors studied the endocytic trafficking of Delta and claimed that internalized Delta appeared to accumulate faster into endosomes in the absence of Mac-Cer. Further analysis of Delta steady-state accumulation in fixed samples suggested a delay in the endosomal trafficking of Delta from Rab5+ to Rab7+ endosomes, which was interpreted to suggest that the inefficient, or delayed, recycling of Delta might cause a loss in Notch receptor activation.

      Finally, the histological analysis of mouse guts following the conditional knock-out of the GlcT gene suggested that Mac-Cer might also be important for proper Notch signaling activity in that context.

      Strengths:

      The genetic analysis is of high quality. The finding that a Mac-Cer deficiency results in reduced Notch activity in the fly gut is important and fully convincing.

      The mouse data, although preliminary, raised the possibility that the role of this specific lipid may be conserved across species.

      Weaknesses:

      This study is not, however, without caveats and several specific conclusions are not fully convincing.

      First, the conclusion that GlcT is specifically required in Intestinal Stem Cells (ISCs) is not fully convincing for technical reasons: NRE-Gal4 may be less active in GlcT mutant cells, and the knock-down of GlcT using Dl-Gal4ts may not be restricted to ISCs given the perdurance of Gal4 and of its downstream RNAi.

      As previously mentioned, we acknowledge that a role for GlcT in early EBs or EEPs cannot be completely ruled out. We will revise our manuscript to present a more cautious conclusion and explicitly describe this possibility in the updated version.

      Second, the results from the antibody uptake assays are not clear.: i) the levels of internalized Delta were not quantified in these experiments; ii) additionally, live guts were incubated with anti-Delta for 3hr. This long period of incubation indicated that the observed results may not necessarily reflect the dynamics of endocytosis of antibody-bound Delta, but might also inform about the distribution of intracellular Delta following the internalization of unbound anti-Delta. It would thus be interesting to examine the level of internalized Delta in experiments with shorter incubation time.

      We thank the reviewer for these excellent questions. In our antibody uptake experiments, we noted that Dl reached its peak accumulation after a 3-hour incubation period. We recognize that quantifying internalized Dl would enhance our analysis, and we will include the corresponding statistical graphs in the revised version of the manuscript. In addition, we agree that during the 3-hour incubation, the potential internalization of unbound anti-Dl cannot be ruled out, as it may influence the observed distribution of intracellular Dl. To address this concern, we plan to supplement our findings with live imaging experiments to capture the dynamics of Dl endocytosis in GlcT mutant ISCs.

      Overall, the proposed working model needs to be solidified as important questions remain open, including: is the endo-lysosomal system, i.e. steady-state distribution of endo-lysosomal markers, affected by the Mac-Cer deficiency? Is the trafficking of Notch also affected by the Mac-Cer deficiency? is the rate of Delta endocytosis also affected by the Mac-Cer deficiency? are the levels of cell-surface Delta reduced upon the loss of Mac-Cer?

      Regarding the impact on the endo-lysosomal system, this is indeed an important aspect to explore. While we did not conduct experiments specifically designed to evaluate the steady-state distribution of endo-lysosomal markers, our analyses utilizing Rab5-GFP overexpression and Rab7 staining did not indicate any significant differences in endosome distribution in MacCer deficient conditions. Moreover, we still observed high expression of the NRE-LacZ reporter specifically at the boundaries of clones in GlcT mutant cells (Fig. 4A), indicating that GlcT mutant EBs remain responsive to Dl produced by normal ISCs located right at the clone boundary. Therefore, we propose that MacCer deficiency may specifically affect Dl trafficking without impacting Notch trafficking.

      In our 3-hour antibody uptake experiments, we observed a notable decrease in cell-surface Dl, which was accompanied by an increase in intracellular accumulation. These findings collectively suggest that Dl may be unstable on the cell surface, leading to its accumulation in early endosomes.

      Third, while the mouse results are potentially interesting, they seem to be relatively preliminary, and future studies are needed to test whether the level of Notch receptor activation is reduced in this model.

      In the mouse small intestine, olfm4 is a well-established target gene of the Notch signaling pathway, and its staining provides a reliable indication of Notch pathway activation. While we attempted to evaluate Notch activation using additional markers, such as Hes1 and NICD, we encountered difficulties, as the corresponding antibody reagents did not perform well in our hands. Despite these challenges, we believe that our findings with Olfm4 provide an important start point for further investigation in the future.

      Reviewer #3 (Public review):

      Summary:

      In this paper, Tang et al report the discovery of a Glycoslyceramide synthase gene, GlcT, which they found in a genetic screen for mutations that generate tumorous growth of stem cells in the gut of Drosophila. The screen was expertly done using a classic mutagenesis/mosaic method. Their initial characterization of the GlcT alleles, which generate endocrine tumors much like mutations in the Notch signaling pathway, is also very nice. Tang et al checked other enzymes in the glycosylceramide pathway and found that the loss of one gene just downstream of GlcT (Egh) gives similar phenotypes to GlcT, whereas three genes further downstream do not replicate the phenotype. Remarkably, dietary supplementation with a predicted GlcT/Egh product, Lactosyl-ceramide, was able to substantially rescue the GlcT mutant phenotype. Based on the phenotypic similarity of the GlcT and Notch phenotypes, the authors show that activated Notch is epistatic to GlcT mutations, suppressing the endocrine tumor phenotype and that GlcT mutant clones have reduced Notch signaling activity. Up to this point, the results are all clear, interesting, and significant. Tang et al then go on to investigate how GlcT mutations might affect Notch signaling, and present results suggesting that GlcT mutation might impair the normal endocytic trafficking of Delta, the Notch ligand. These results (Fig X-XX), unfortunately, are less than convincing; either more conclusive data should be brought to support the Delta trafficking model, or the authors should limit their conclusions regarding how GlcT loss impairs Notch signaling. Given the results shown, it's clear that GlcT affects EE cell differentiation, but whether this is via directly altering Dl/N signaling is not so clear, and other mechanisms could be involved. Overall the paper is an interesting, novel study, but it lacks somewhat in providing mechanistic insight. With conscientious revisions, this could be addressed. We list below specific points that Tang et al should consider as they revise their paper.

      Strengths:

      The genetic screen is excellent.

      The basic characterization of GlcT phenotypes is excellent, as is the downstream pathway analysis.

      Weaknesses:

      (1) Lines 147-149, Figure 2E: here, the study would benefit from quantitations of the effects of loss of brn, B4GalNAcTA, and a4GT1, even though they appear negative.

      We will incorporate the quantifications for the effects of the loss of brn, B4GalNAcTA, and a4GT1 in the updated Figure 2.

      (2) In Figure 3, it would be useful to quantify the effects of LacCer on proliferation. The suppression result is very nice, but only effects on Pros+ cell numbers are shown.

      We will add quantifications of the number of EEs per clone to the updated Figure 3.

      (3) In Figure 4A/B we see less NRE-LacZ in GlcT mutant clones. Are the data points in Figure 4B per cell or per clone? Please note. Also, there are clearly a few NRE-LacZ+ cells in the mutant clone. How does this happen if GlcT is required for Dl/N signaling?

      In Figure 4B, the data points represent the fluorescence intensity per single cell within each clone. It is true that a few NRE-LacZ+ cells can still be observed within the mutant clone; however, this does not contradict our conclusion. As noted, high expression of the NRE-LacZ reporter was specifically observed around the clone boundaries in MacCer deficient cells (Fig. 4A), indicating that the mutant EBs can normally receive Dl signal from the normal ISCs located at the clone boundary and activate the Notch signaling pathway. Therefore, we believe that, although affecting Dl trafficking, MacCer deficiency does not significantly affect Notch trafficking.

      (4) Lines 222-225, Figure 5AB: The authors use the NRE-Gal4ts driver to show that GlcT depletion in EBs has no effect. However, this driver is not activated until well into the process of EB commitment, and RNAi's take several days to work, and so the author's conclusion is "specifically required in ISCs" and not at all in EBs may be erroneous.

      As previously mentioned, we acknowledge that a role for GlcT in early EBs or EEPs cannot be completely ruled out. We will revise our manuscript to present a more cautious conclusion and describe this possibility in the updated version.

      (5) Figure 5C-F: These results relating to Delta endocytosis are not convincing. The data in Fig 5C are not clear and not quantitated, and the data in Figure 5F are so widely scattered that it seems these co-localizations are difficult to measure. The authors should either remove these data, improve them, or soften the conclusions taken from them. Moreover, it is unclear how the experiments tracing Delta internalization (Fig 5C) could actually work. This is because for this method to work, the anti-Dl antibody would have to pass through the visceral muscle before binding Dl on the ISC cell surface. To my knowledge, antibody transcytosis is not a common phenomenon.

      We thank the reviewer for these insightful comments and suggestions. In our in vivo experiments, we observed increased co-localization of Rab5 and Dl in GlcT mutant ISCs, indicating that Dl trafficking is delayed at the transition to Rab7⁺ late endosomes, a finding that is further supported by our antibody uptake experiments. We acknowledge that the data presented in Fig. 5C are not fully quantified and that the co-localization data in Fig. 5F may appear somewhat scattered; therefore, we will include additional quantification and enhance the data presentation in the revised manuscript.

      Regarding the concern about antibody internalization, we appreciate this point. We currently do not know if the antibody reaches the cell surface of ISCs by passing through the visceral muscle or via other routes. Given that the experiment was conducted with fragmented gut, it is possible that the antibody may penetrate into the tissue through mechanisms independent of transcytosis.

      As mentioned earlier, we plan to supplement our findings with live imaging experiments to investigate the dynamics of Dl/Notch endocytosis in both normal and GlcT mutant ISCs. Anyway, due to technical challenges and potential pitfalls associated with the assays, we agree that this part of data is not fully convincing and we will provide a more cautious conclusion in the revised manuscript.

      (6) It is unclear whether MacCer regulates Dl-Notch signaling by modifying Dl directly or by influencing the general endocytic recycling pathway. The authors say they observe increased Dl accumulation in Rab5+ early endosomes but not in Rab7+ late endosomes upon GlcT depletion, suggesting that the recycling endosome pathway, which retrieves Dl back to the cell surface, may be impaired by GlcT loss. To test this, the authors could examine whether recycling endosomes (marked by Rab4 and Rab11) are disrupted in GlcT mutants. Rab11 has been shown to be essential for recycling endosome function in fly ISCs.

      We agree that assessing the state of recycling endosomes, especially by using markers such as Rab11, would be valuable in determining whether MacCer regulates Dl-Notch signaling by directly modifying Dl or by influencing the broader endocytic recycling pathway. We will incorporate these experiments into our future experimental plans to further characterize Dl trafficking in GlcT mutant ISCs.

      (7) It remains unclear whether Dl undergoes post-translational modification by MacCer in the fly gut. At a minimum, the authors should provide biochemical evidence (e.g., Western blot) to determine whether GlcT depletion alters the protein size of Dl.

      While we propose that MacCer may function as a component of lipid rafts, facilitating Dl membrane anchorage and endocytosis, we also acknowledge the possibility that MacCer could serve as a substrate for protein modifications of Dl necessary for its proper function. Conducting biochemical analyses to investigate potential post-translational modifications of Dl by MacCer would indeed provide valuable insights. To address this, we will incorporate Western blot analysis into our experimental plan to determine whether GlcT depletion affects the protein size of Dl.

      (8) It is unfortunate that GlcT doesn't affect Notch signaling in other organs on the fly. This brings into question the Delta trafficking model and the authors should note this. Also, the clonal marker in Figure 6C is not clear.

      In the revised working model, we will explicitly specify that the events occur in intestinal stem cells. Regarding Figure 6C, we will delineate the clone with a white dashed line to enhance its clarity and visual comprehension.

      (9) The authors state that loss of UGCG in the mouse small intestine results in a reduced ISC count. However, in Supplementary Figure C3, Ki67, a marker of ISC proliferation, is significantly increased in UGCG-CKO mice. This contradiction should be clarified. The authors might repeat this experiment using an alternative ISC marker, such as Lgr5.

      Previous studies have indicated that dysregulation of the Notch signaling pathway can result in a reduction in the number of ISCs. While we did not perform a direct quantification of ISC numbers in our experiments, our olfm4 staining—which serves as a reliable marker for ISCs—demonstrates a clear reduction in the number of positive cells in UGCG-CKO mice.

      The increased Ki67 signal we observed reflects enhanced proliferation in the transit-amplifying region, and it does not directly indicate an increase in ISC number. Therefore, in UGCG-CKO mice, we observe a decrease in the number of ISCs, while there is an increase in transit-amplifying (TA) cells (progenitor cells). This increase in TA cells is probably a secondary consequence of the loss of barrier function associated with the UGCG knockout.

    1. eLife Assessment

      TrASPr is an important contribution that leverages transformer models focused on regulatory regions to enhance predictions of tissue-specific splicing events. The evidence supporting the authors' claims is convincing, with rigorous analyses demonstrating improved performance relative to existing models, although some aspects of the evaluation would benefit from further clarification. This work will be of particular interest to researchers in computational genomics and RNA biology, as it offers both a refined predictive model and a new tool to designing RNA sequences for targeted splicing outcomes.

    2. Reviewer #1 (Public review):

      Summary:

      The authors propose a transformer-based model for the prediction of condition - or tissue-specific alternative splicing and demonstrate its utility in the design of RNAs with desired splicing outcomes, which is a novel application. The model is compared to relevant existing approaches (Pangolin and SpliceAI) and the authors clearly demonstrate its advantage. Overall, a compelling method that is well thought out and evaluated.

      Strengths:

      (1) The model is well thought out: rather than modeling a cassette exon using a single generic deep learning model as has been done e.g. in SpliceAI and related work, the authors propose a modular architecture that focuses on different regions around a potential exon skipping event, which enables the model to learn representations that are specific to those regions. Because each component in the model focuses on a fixed length short sequence segment, the model can learn position-specific features. Another difference compared to Pangolin and SpliceAI which are focused on modeling individual splice junctions is the focus on modeling a complete alternative splicing event.

      (2) The model is evaluated in a rigorous way - it is compared to the most relevant state-of-the-art models, uses machine learning best practices, and an ablation study demonstrates the contribution of each component of the architecture.

      (3) Experimental work supports the computational predictions.

      (4) The authors use their model for sequence design to optimize splicing outcomes, which is a novel application.

      Weaknesses:

      No weaknesses were identified by this reviewer, but I have the following comments:

      (1) I would be curious to see evidence that the model is learning position-specific representations.

      (2) The transformer encoders in TrASPr model sequences with a rather limited sequence size of 200 bp; therefore, for long introns, the model will not have good coverage of the intronic sequence. This is not expected to be an issue for exons.

      (3) In the context of sequence design, creating a desired tissue- or condition-specific effect would likely require disrupting or creating motifs for splicing regulatory proteins. In your experiments for neuronal-specific Daam1 exon 16, have you seen evidence for that? Most of the edits are close to splice junctions, but a few are further away.

      (4) For sequence design, of tissue- or condition-specific effect in neuronal-specific Daam1 exon 16 the upstream exonic splice junction had the most sequence edits. Is that a general observation? How about the relative importance of the four transformer regions in TrASPr prediction performance?

      (5) The idea of lightweight transformer models is compelling, and is widely applicable. It has been used elsewhere. One paper that came to mind in the protein realm:<br /> Singh, Rohit, et al. "Learning the language of antibody hypervariability." Proceedings of the National Academy of Sciences 122.1 (2025): e2418918121.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a transformer-based model, TrASPr, for the task of tissue-specific splicing prediction (with experiments primarily focused on the case of cassette exon inclusion) as well as an optimization framework (BOS) for the task of designing RNA sequences for desired splicing outcomes.

      For the first task, the main methodological contribution is to train four transformer-based models on the 400bp regions surrounding each splice site, the rationale being that this is where most splicing regulatory information is. In contrast, previous work trained one model on a long genomic region. This new design should help the model capture more easily interactions between splice sites. It should also help in cases of very long introns, which are relatively common in the human genome.

      TrASPr's performance is evaluated in comparison to previous models (SpliceAI, Pangolin, and SpliceTransformer) on numerous tasks including splicing predictions on GTEx tissues, ENCODE cell lines, RBP KD data, and mutagenesis data. The scope of these evaluations is ambitious; however, significant details on most of the analyses are missing, making it difficult to evaluate the strength of the evidence. Additionally, state-of-the-art models (SpliceAI and Pangolin) are reported to perform extremely poorly in some tasks, which is surprising in light of previous reports of their overall good prediction accuracy; the reasoning for this lack of performance compared to TrASPr is not explored.

      In the second task, the authors combine Latent Space Bayesian Optimization (LSBO) with a Transformer-based variational autoencoder to optimize RNA sequences for a given splicing-related objective function. This method (BOS) appears to be a novel application of LSBO, with promising results on several computational evaluations and the potential to be impactful on sequence design for both splicing-related objectives and other tasks.

      Strengths:

      (1) A novel machine learning model for an important problem in RNA biology with excellent prediction accuracy.

      (2) Instead of being based on a generic design as in previous work, the proposed model incorporates biological domain knowledge (that regulatory information is concentrated around splice sites). This way of using inductive bias can be important to future work on other sequence-based prediction tasks.

      Weaknesses:

      (1) Most of the analyses presented in the manuscript are described in broad strokes and are often confusing. As a result, it is difficult to assess the significance of the contribution.

      (2) As more and more models are being proposed for splicing prediction (SpliceAI, Pangolin, SpliceTransformer, TrASPr), there is a need for establishing standard benchmarks, similar to those in computer vision (ImageNet). Without such benchmarks, it is exceedingly difficult to compare models. For instance, Pangolin was apparently trained on a different dataset (Cardoso-Moreira et al. 2019), and using a different processing pipeline (based on SpliSER) than the ones used in this submission. As a result, the inferior performance of Pangolin reported here could potentially be due to subtle distribution shifts. The authors should add a discussion of the differences in the training set, and whether they affect your comparisons (e.g., in Figure 2). They should also consider adding a table summarizing the various datasets used in their previous work for training and testing. Publishing their training and testing datasets in an easy-to-use format would be a fantastic contribution to the community, establishing a common benchmark to be used by others.

      (3) Related to the previous point, as discussed in the manuscript, SpliceAI, and Pangolin are not designed to predict PSI of cassette exons. Instead, they assign a "splice site probability" to each nucleotide. Converting this to a PSI prediction is not obvious, and the method chosen by the authors (averaging the two probabilities (?)) is likely not optimal. It would interesting to see what happens if an MLP is used on top of the four predictions (or the outputs of the top layers) from SpliceAI/Pangolin. This could also indicate where the improvement in TrASPr comes from: is it because TrASPr combines information from all four splice sites? Also, consider fine-tuning Pangolin on cassette exons only (as you do for your model).

      (4) L141, "TrASPr can handle cassette exons spanning a wide range of window sizes from 181 to 329,227 bases - thanks to its multi-transformer architecture." This is reported to be one of the primary advantages compared to existing models. Additional analysis should be included on how TrASPr performs across varying exon and intron sizes, with comparison to SpliceAI, etc.

      (5) L171, "training it on cassette exons". This seems like an important point: previous models were trained mostly on constitutive exons, whereas here the model is trained specifically on cassette exons. This should be discussed in more detail.

      (6) L214, ablations of individual features are missing.

      (7) L230, "ENCODE cell lines", it is not clear why other tissues from GTEx were not included.

      (8) L239, it is surprising that SpliceAI performs so badly, and might suggest a mistake in the analysis. Additional analysis and possible explanations should be provided to support these claims. Similarly, the complete failure of SpliceAI and Pangolin is shown in Figure 4d.

      (9) BOS seems like a separate contribution that belongs in a separate publication. Instead, consider providing more details on TrASPr.

      (10) The authors should consider evaluating BOS using Pangolin or SpliceTransformer as the oracle, in order to measure the contribution to the sequence generation task provided by BOS vs TrASPr.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors propose a transformer-based model for the prediction of condition - or tissue-specific alternative splicing and demonstrate its utility in the design of RNAs with desired splicing outcomes, which is a novel application. The model is compared to relevant existing approaches (Pangolin and SpliceAI) and the authors clearly demonstrate its advantage. Overall, a compelling method that is well thought out and evaluated.

      Strengths:

      (1) The model is well thought out: rather than modeling a cassette exon using a single generic deep learning model as has been done e.g. in SpliceAI and related work, the authors propose a modular architecture that focuses on different regions around a potential exon skipping event, which enables the model to learn representations that are specific to those regions. Because each component in the model focuses on a fixed length short sequence segment, the model can learn position-specific features. Another difference compared to Pangolin and SpliceAI which are focused on modeling individual splice junctions is the focus on modeling a complete alternative splicing event.

      (2) The model is evaluated in a rigorous way - it is compared to the most relevant state-of-the-art models, uses machine learning best practices, and an ablation study demonstrates the contribution of each component of the architecture.

      (3) Experimental work supports the computational predictions.    

      (4) The authors use their model for sequence design to optimize splicing outcomes, which is a novel application.

      We wholeheartedly thank Reviewer #1 for these positive comments regarding the modeling approach we took to this task and the evaluations we performed. We have put a lot of work and thought into this and it is gratifying to see the results of that work acknowledged like this.

      Weaknesses:

      No weaknesses were identified by this reviewer, but I have the following comments:

      (1) I would be curious to see evidence that the model is learning position-specific representations.

      This is an excellent suggestion to further assess what the model is learning. We have several ideas on how to test this which we will plan to report in the revised version. 

      (2) The transformer encoders in TrASPr model sequences with a rather limited sequence size of 200 bp; therefore, for long introns, the model will not have good coverage of the intronic sequence. This is not expected to be an issue for exons.

      Yes we can divide predictions by intron length, that’s a good suggestion. We will report on that in the revision.

      (3) In the context of sequence design, creating a desired tissue- or condition-specific effect would likely require disrupting or creating motifs for splicing regulatory proteins. In your experiments for neuronal-specific Daam1 exon 16, have you seen evidence for that? Most of the edits are close to splice junctions, but a few are further away.

      That is another good question and suggestion. In the original paper describing the mutation locations some motif similarities were noted to PTB (CU) and CUG/Mbnl-like elements (Barash et al Nature 2010). We could revisit this now with an RBP motif D.B. such as http://rbpdb.ccbr.utoronto.ca/. We note the ENCODE uses human cell lines and cannot be used for this but we will also look for mouse CLIP and KD data supporting such regulatory findings. 

      (4) For sequence design, of tissue- or condition-specific effect in neuronal-specific Daam1 exon 16 the upstream exonic splice junction had the most sequence edits. Is that a general observation? How about the relative importance of the four transformer regions in TrASPr prediction performance?

      This is another excellent question that we plan to follow up with matching analysis in the revision.

      (5) The idea of lightweight transformer models is compelling, and is widely applicable. It has been used elsewhere. One paper that came to mind in the protein realm:

      Singh, Rohit, et al. "Learning the language of antibody hypervariability." Proceedings of the National Academy of Sciences 122.1 (2025): e2418918121.

      Yes, we are for sure not the only/first to advocate for such an approach. We will be sure to make that point clear in the revision and thank the reviewer for the example from a different domain.  

      Reviewer #2 (Public review):

      Summary:

      The authors present a transformer-based model, TrASPr, for the task of tissue-specific splicing prediction (with experiments primarily focused on the case of cassette exon inclusion) as well as an optimization framework (BOS) for the task of designing RNA sequences for desired splicing outcomes.

      For the first task, the main methodological contribution is to train four transformer-based models on the 400bp regions surrounding each splice site, the rationale being that this is where most splicing regulatory information is. In contrast, previous work trained one model on a long genomic region. This new design should help the model capture more easily interactions between splice sites. It should also help in cases of very long introns, which are relatively common in the human genome.

      TrASPr's performance is evaluated in comparison to previous models (SpliceAI, Pangolin, and SpliceTransformer) on numerous tasks including splicing predictions on GTEx tissues, ENCODE cell lines, RBP KD data, and mutagenesis data. The scope of these evaluations is ambitious; however, significant details on most of the analyses are missing, making it difficult to evaluate the strength of the evidence. Additionally, state-of-the-art models (SpliceAI and Pangolin) are reported to perform extremely poorly in some tasks, which is surprising in light of previous reports of their overall good prediction accuracy; the reasoning for this lack of performance compared to TrASPr is not explored.

      In the second task, the authors combine Latent Space Bayesian Optimization (LSBO) with a Transformer-based variational autoencoder to optimize RNA sequences for a given splicing-related objective function. This method (BOS) appears to be a novel application of LSBO, with promising results on several computational evaluations and the potential to be impactful on sequence design for both splicing-related objectives and other tasks.

      We thank Reviewer #2 for this detailed summary and positive view of our work. It seems the main issue raised in this summary regards the evaluations: The reviewer finds details of the evaluations missing and the fact that SpliceAI and Pangolin perform poorly on some of the tasks to be surprising. In general, we made a concise effort to include the required details, including code and data tables, but will be sure to include more details based on the specific questions/comments listed below. As for the perceived performance issues for Pangolin/SpliceAI we believe this may be the result of not making it clear what tasks they perform well on vs those in which they do not work well. We give more details below. 

      Strengths:

      (1) A novel machine learning model for an important problem in RNA biology with excellent prediction accuracy.

      (2) Instead of being based on a generic design as in previous work, the proposed model incorporates biological domain knowledge (that regulatory information is concentrated around splice sites). This way of using inductive bias can be important to future work on other sequence-based prediction tasks.

      Weaknesses:

      (1) Most of the analyses presented in the manuscript are described in broad strokes and are often confusing. As a result, it is difficult to assess the significance of the contribution.

      We made an effort to make the tasks be specific and detailed,  including making the code and data of those available. Still, it is evident from the above comment Reviewer #2 found this to be lacking. We will review the description and make an effort to improve that given the clarifications we include below. 

      (2) As more and more models are being proposed for splicing prediction (SpliceAI, Pangolin, SpliceTransformer, TrASPr), there is a need for establishing standard benchmarks, similar to those in computer vision (ImageNet). Without such benchmarks, it is exceedingly difficult to compare models. For instance, Pangolin was apparently trained on a different dataset (Cardoso-Moreira et al. 2019), and using a different processing pipeline (based on SpliSER) than the ones used in this submission. As a result, the inferior performance of Pangolin reported here could potentially be due to subtle distribution shifts. The authors should add a discussion of the differences in the training set, and whether they affect your comparisons (e.g., in Figure 2). They should also consider adding a table summarizing the various datasets used in their previous work for training and testing. Publishing their training and testing datasets in an easy-to-use format would be a fantastic contribution to the community, establishing a common benchmark to be used by others.

      There are several good points to unpack here. First, we agree that a standard benchmark will be useful to include. We will work to create and include one for the revision. That said, we note that unlike the example given by Reviewer #2 (ImageNet) there are no standards for the splicing prediction tasks. There are actually different task definitions with different input/outputs as we tried to cover briefly in the introduction section. 

      Second, regarding the usage of different data and distribution shifts as potential reasons for Pangolin performance differences. We originally evaluated Pangolin after retraining it with MAJIQ based quantifications and found no significant changes. We will include a more detailed analysis of Pangolin retrained like this in the revision. We also note that Pangolin original training involved significantly more data as it was trained on four species with four tissues each, and we only evaluated it on three of those tissues (for human), in exons the authors deemed as test data. That said, we very much agree that retraining Pangolin as mentioned above is warranted, as well as clearly listing what data was used for training as suggested by the reviewer.

      (3) Related to the previous point, as discussed in the manuscript, SpliceAI, and Pangolin are not designed to predict PSI of cassette exons. Instead, they assign a "splice site probability" to each nucleotide. Converting this to a PSI prediction is not obvious, and the method chosen by the authors (averaging the two probabilities (?)) is likely not optimal. It would interesting to see what happens if an MLP is used on top of the four predictions (or the outputs of the top layers) from SpliceAI/Pangolin. This could also indicate where the improvement in TrASPr comes from: is it because TrASPr combines information from all four splice sites? Also, consider fine-tuning Pangolin on cassette exons only (as you do for your model).

      As mentioned above, we originally did try to retrain Pangolin with MAJIQ PSI values without observing much differences, but we will repeat this and include the results in the revision. Trying to combine 4 different SpliceAI models as proposed by the Reviewer seems to be a different kind of a new model, one that takes 4 large ResNets and combines those with annotation. Related to that, we did try to replace the transformers in our ablation study. The reviewer’s suggestion seems like another interesting architecture to try but since this is a non existing model that would likely require some adjustments. Given that, we view adding such a new model architecture as beyond the scope of this work.

      (4) L141, "TrASPr can handle cassette exons spanning a wide range of window sizes from 181 to 329,227 bases - thanks to its multi-transformer architecture." This is reported to be one of the primary advantages compared to existing models. Additional analysis should be included on how TrASPr performs across varying exon and intron sizes, with comparison to SpliceAI, etc.

      Yes, that is a good suggestion, similar to one made by Reviewer #1 as well. We plan to include such analysis in the revision. 

      (5) L171, "training it on cassette exons". This seems like an important point: previous models were trained mostly on constitutive exons, whereas here the model is trained specifically on cassette exons. This should be discussed in more detail.

      Previous models were not trained exclusively on constitutive exons and Pangolin specifically was trained with their version of junction usage across tissues. That said, the reviewer’s point is valid (and similar to ones made above) about a need to have a matched training/testing. As noted above we plan to include Pangolin training on our PSI values for comparison.

      (6) L214, ablations of individual features are missing.

      OK

      (7) L230, "ENCODE cell lines", it is not clear why other tissues from GTEx were not included.

      The task here was to assess predictions in very different conditions, hence we tested on completely different data of human cell lines rather than similar tissue samples. Yes, we can also assess on unseen GTEX tissues as well.

      (8) L239, it is surprising that SpliceAI performs so badly, and might suggest a mistake in the analysis. Additional analysis and possible explanations should be provided to support these claims. Similarly, the complete failure of SpliceAI and Pangolin is shown in Figure 4d.

      Line 239 refers to predicting relative inclusion levels between competing 3’ and 5’ splice sites. We admit we too expected this to be better for SpliceAI and Pangolin and will be sure to recheck for bugs, but to be fair we are not aware of a similar assessment being done for either of those algorithms (i.e. relative inclusion for 3’ and 5’ alternative splice site events).

      One issue we ran into, reflected in Reviewer #2 comments, is the mix between tasks that SpliceAI and Pangolin excel at and other tasks where they should not necessarily be expected to excel. Both algorithms focus on cryptic splice site creation/disruption. This has been the focus of those papers and subsequent applications.  While Pangolin added tissue specificity to SpliceAI training, the authors themselves admit “...predicting differential splicing across tissues from sequence alone is possible but remains a considerable challenge and requires further investigation”. The actual performance on this task is not included in Pangolin’s main text, but we refer Reviewer #2 to supplementary figure S4 in that manuscript to get a sense of Pangolin’s reported performance on this task. Similar to that, Figure 4d is for predicting *tissue specific* regulators. We do not think it is surprising that SpliceAI (tissue agnostic) and Pangolin (slight improvement compared to SpliceAI in tissue specific predictions) do not perform well on this task.  Similarly, we do not find the results in Figure 4C surprising either. These are for mutations that slightly alter inclusion level of an exon, not something SpliceAI was trained on, as it was simply trained on splice sites yes/no predictions. As noted and we will stress in the revision as well, training Pangolin on this dataset like TrASPr gives similar performance. That is to be expected as well - Pangolin is constructed to capture changes in PSI, those changes are not even tissue specific for CD19 data and the model has no problem/lack of capacity to generalize from the training set just like TrASPr does. In fact, if you only use combination of known mutations seen during training a simple regression model gives correlation of ~92-95% (Cortés-López et al 2022). In summary, we believe that better understanding of what one can realistically expect from models such as SpliceAI, Pangolin, and TrASPr will go a long way to have them better understood and used effectively. We will try to improve on that in the revision.

      (9) BOS seems like a separate contribution that belongs in a separate publication. Instead, consider providing more details on TrASPr.

      We thank the reviewer for the suggestion. We agree those are two distinct contributions and we indeed considered having them as two separate papers. However, there is strong coupling between the design algorithm (BOS) and the predictor that enables it (TrASPr). This coupling is both conceptual (TrASPr as a “teacher”) and practical in terms of evaluations. While we use experimental data (experiments done involving Daam1 exon 16, CD19 exon 2) we still rely heavily on evaluations by TrASPr itself. A completely independent evaluation would have required a high-throughput experimental system to assess designs, which is beyond the scope of the current paper. For those reasons we eventually decided to make it into what we hope is a more compelling combined story about generative models for prediction and design of RNA splicing. 

      (10) The authors should consider evaluating BOS using Pangolin or SpliceTransformer as the oracle, in order to measure the contribution to the sequence generation task provided by BOS vs TrASPr.

      We can definitely see the logic behind trying BOS with different predictors. That said, as we note above most of BOS evaluations are based on the “teacher”. As such, it is unclear what value replacing the teacher would bring. We also note that given this limitation we focus mostly on evaluations in comparison to existing approaches (genetic algorithm or random mutations as a strawman).

    1. eLife Assessment

      Fleming et al sought to better understand DNAJC7's function in motor neurons as mutations in this gene have been associated with amyotrophic lateral sclerosis (ALS). Using iPSC-derived motor neurons, interactome, and transcriptomic data, they provide solid evidence that loss-of-function mutations in DNAJC7 disrupt RNA binding proteins and resistance to proteasomal stress. These important findings advance our understanding of DNAJC7 in motor neurons while providing clues to how its loss may be causal for ALS; nonetheless, the experiments were performed with a single iPSC line, while at least 3 are deemed to be required to validate the results. Furthermore, the mechanistic evidence is still incomplete with respect to how DNAJC7 mutations lead to HSF1 impaired activity, and whether it is direct or not.

    2. Reviewer #1 (Public review):

      Summary

      Fleming et al. present the first, proteomics-based attempt to identify the possible mechanism of action of ALS-linked DNAJC7 molecular chaperone in pathology. Impressively, it is the first report of DNAJC7 interactome studies, using a suitable iPSC-derived lower motor neuron model. Using a co-immunoprecipitation approach the authors identified that the interactome of DNAJC7 is predominantly composed of proteins engaged in response to stress, but also that this interactome is enriched in RNA-binding proteins. The authors also created a DNAJC7 haploinsufficiency cellular model and show the resulting increased insolubility of HNRNPU protein which causes disruptions in its functionality as shown by analysis of its transcriptional targets. Finally, this study uses pharmacological agents to test the effect of decreased DNAJC7 expression on cell response to proteotoxic stress and finds evidence that DNAJC7 regulates the activation of Heat shock factor 1 (HSF1) protein upon stress conditions.

      Strengths

      (1)This study uses the best so far model to study the interactome and possible mechanism of action of DNAJC7 molecular chaperone in an iPSC-derived cellular model of motor neurons. Furthermore, the authors also looked into available transcriptome databases of ALS patient samples to further test whether their findings may yield relevance to pathology.

      (2) The extent to which the authors are explicit about the sample sizes, protocols, and statistical tests used throughout this manuscript, should be applauded. This will help the whole field in their efforts to reliably replicate the results in this study.

      Weaknesses

      (1) The most significant caveat of interactome experiments inherently comes from the method of choice. It is possible that by using the co-purification approach of DNAJC7 IP the resulting pool of binding partners is depleted in proteins that interact with DNAJC7 weakly or transiently. An alternative approach presumably more sensitive towards weaker binders could use the TurboID-based proximity-labeling method.

      (2) The authors mention in Results (and Figure 2D) that HNRNPA1 was identified as DNAJC7-interacting protein in their co-IP experiments, however, an identifier for this protein cannot be found in Figure 1C and Table S1 listing the proteomics results. Could the authors appropriately update Figure 1C and Table S1, or if HNRNPA1 wasn't really a hit then remove it from listed HNRNPs?

      (3) No further validation of DNAJC7-interacting proteins from the heat-shock protein (HSP) family. Current validation of mass spectrometry-identified proteins comes from IP-western blots with antibodies against HSPs. It would be interesting to further inspect possible interactions of these proteins by inspecting co-localization with immunocytochemistry.

      (4) Similarly, the observation of DNAJC7 haploinsufficiency causing an increase in HNRNPU insolubility could be also easily further confirmed by checking for the emergence of "puncta" under a fluorescence microscope, in addition to provided WB experiments from MN lysates.

      (5) I would like to recommend the authors to also provide with this manuscript a complete dataset (possibly in the form of a table, presented similarly as Table S1) resulting from experiments presented in Figures 2F and S2D. The information on upregulated and downregulated targets in their DNAJC7 haploinsufficiency model would be a valuable resource for the field and enable further investigations.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript titled "The ALS-associated co-chaperone DNAJC7 mediates neuroprotection against proteotoxic stress by modulating HSF1 activity" describes experiments carried out in iPS cells re-differentiated into motor neurons (iNeuons, MNs) seeking to assess the functions of the J protein DnaJC7 in proteostasis. This study also investigates how an ALS-associated mutant variant (R156X) alters DnaJC7 function.

      The proteomic studies identify proteins interacting with DnaJC7. Using mRNA profiling in haplo-insufficient cells (+/R156X) compared to wild-type cells, the study seeks to identify pathways modulated by partial loss of DnaJC7 function. Studies in the DnaJC7 haplo-insufficient cells also indicate changes in the properties of ALS-associated proteins, such as HNRNPU and Matrin3 both of which are involved in the regulation of gene expression. The study also shows data indicating that DnaJC7 haploinsufficiency sensitizes cells to proteostatic stress induced by proteosome inhibition by MG132 and Hsp90 inhibition by Ganetespib. Lastly, the study investigates how DnaJC7 modulates the activity of the heat shock transcription factor (Hsf1) and thus the heat shock response.

      Strengths:

      The manuscript is well presented and most of the data is of high quality and convincing. The figures and supplementary figures are clear and easy to follow.

      This study overall provides important new insights into a mostly underexplored molecular co-chaperone and its role in proteostasis. The proteomic and transcriptomic experiments certainly advance our understanding of DnaJC7. The MN model is well-suited for these studies addressing the role of DnaJC7, particularly regarding ALS. The haplo-insufficient MNs are also a suitable model to study a potential loss of function mechanism caused by (some) fALS-associated mutants in ALS, such as the R156X mutation used here.

      Since so little is known about DnaJC7 function, the exploratory approaches applied here are particularly useful.

      Weaknesses:

      Without follow-up studies, however, e.g., with select interacting proteins, the study provides merely a descriptive list of possible interactions without mechanistic insights. Also, most interactions have not been extensively (only a few examples) validated by other methods or individual experiments.

      A major limitation of the study in its current form is that none of the experimental approaches allow for assessing the specific functions of JC7. In the absence of specificity controls, e.g., other J proteins or HOP, which, like DnaJC7, contains TPR domains and can interact with Hsp70 and Hsp90, it remains unclear if the proposed functions of DnaJC7 are specific/unique or shared by other J proteins or molecular chaperones. Accordingly, it would be highly informative to add experiments to assess if some of the reported DnaJC7 protein-protein interactions and the transcriptional alterations in haplo-insufficient cells are DnaJC7specific or also occur with other J proteins or molecular chaperones. This seems particularly important to discern specific DnaJC7 functions from general effects caused by impaired proteostasis.

      It would be informative to explore how cellular stress (e.g., MG132 treatment) alters DnaJC7 interactions with other proteins (J proteins, HOP), ideally in additional/comparative proteomic studies.<br /> The mechanism underlying the proposed regulation of Hsf1 by DnaJC7 is not quite clear to me (Figures 4 A-I). There is no evidence of a direct physical interaction between DnJC7 and Hsf1 in the proteomic data or elsewhere. It seems plausible that Hsf1/HSR dysregulation in the haplo-insufficient cells might be due to rather indirect effects, e.g., increased protein misfolding. Also, additional data showing differential activation of Hsf1 in +/+ versus +/- cells would strengthen this part, e.g. showing differences in Hsf1 trimerization, Hsp70 interactions, nuclear localization, etc.

      The manuscript might also benefit from considering the literature showing an unusually inactive HSR and Hsf1 activity in motor neurons (e.g. published by the Durham lab).

      The correlation with transcriptomic data from ALS patients compared to neurotypical controls (Figures 4 L, M) suggesting a direct role of Hsf1/HSR seems unlikely at this point. In my view, the transcriptional dysregulation in ALS patients could be unrelated to Hsf1 dysregulation and caused by rather non-specific effects of neuronal decay in ALS.

    4. Reviewer #3 (Public review):

      Summary:

      Fleming et al sought to better understand DNAJC7's function in motor neurons as mutations in this gene have been associated with amyotrophic lateral sclerosis (ALS). The research question is relevant and important. The authors use an induced pluripotent stem cell (iPSC) line to derive motor neurons (iMNs) finding that DNAJC7 interacts with RNA-binding proteins (RBP) in wild-type cells and a truncated mutant DNAJC7[R156*] disrupts the RBP, hnRNPU, by promoting its accumulation into insoluble fractions. Given that DNAJC7 is predicted to regulate stress responses, the authors then find that DNAJC7[R156*] expression sensitizes the iMNs to proteosomal stress by disrupting the expression of the key heat stress response regulator, HSF1. These findings support that loss-of-function mutations in DNAJC7 will indeed sensitize motor neurons to proteotoxic stress, potentially driving ALS. The association with RBPs, which routinely are found to be disrupted in ALS, is of interest and warrants further study.

      Strengths:

      (1) The research question is relevant and important. The authors provide interesting data that DNAJC7 mutations impact two important features in ALS, the dysregulation of RNA binding proteins and the sensitivity of motor neurons to proteotoxic stress.

      (2) The authors provide solid data to support their findings and the assays are appropriate.

      Weaknesses:

      (1) The authors rely on a single iPSC line throughout the text, using the same line to make the mutation-carrying cells. iPSCs are highly variable and at minimum 3 lines, typically 5 lines, should be used to define consistent findings. This work would be greatly strengthened if 3 or more lines were used to confirm consistent effects. This is particularly concerning given that iPSCs were differentiated using growth factors versus genetic induction. Growth-factor-based differentiations are more variable.

      (2) The authors argue that HSF1 and its targets are downregulated in sporadic ALS and mutant C9orf72 ALS. The first concern is that these transcriptomics data were derived from cortical tissue which does not contain motor neurons (Pineda et al. 2024 Cell 187: 1971-1989.e1916). The second concern is that the inclusion of C9orf72 mutant tissue is not well justified as (1) this mutation is associated with an upregulation of HSF1 and its targets in patients (Mordes et al, Acta Neuropathol Commun 2018 6(1):55; Lee et al Neuron 2023 111(9):1381-1390) and (2) the C9orf72 mutation is associated with a ALS/FTD spectrum disorder defined by TDP-43 pathology. Disease mechanisms associated with this spectrum disorder may not overlap with traditional ALS which is typically defined by SOD1 pathology.

      (3) As a whole, the findings are mechanistically disjointed, and additional experiments or discussion would help to connect the dots a bit more.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      Fleming et al. present the first, proteomics-based attempt to identify the possible mechanism of action of ALS-linked DNAJC7 molecular chaperone in pathology. Impressively, it is the first report of DNAJC7 interactome studies, using a suitable iPSC-derived lower motor neuron model. Using a co-immunoprecipitation approach the authors identified that the interactome of DNAJC7 is predominantly composed of proteins engaged in response to stress, but also that this interactome is enriched in RNA-binding proteins. The authors also created a DNAJC7 haploinsufficiency cellular model and show the resulting increased insolubility of HNRNPU protein which causes disruptions in its functionality as shown by analysis of its transcriptional targets. Finally, this study uses pharmacological agents to test the effect of decreased DNAJC7 expression on cell response to proteotoxic stress and finds evidence that DNAJC7 regulates the activation of Heat shock factor 1 (HSF1) protein upon stress conditions.

      Strengths

      (1) This study uses the best so far model to study the interactome and possible mechanism of action of DNAJC7 molecular chaperone in an iPSC-derived cellular model of motor neurons. Furthermore, the authors also looked into available transcriptome databases of ALS patient samples to further test whether their findings may yield relevance to pathology.

      (2) The extent to which the authors are explicit about the sample sizes, protocols, and statistical tests used throughout this manuscript, should be applauded. This will help the whole field in their efforts to reliably replicate the results in this study.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses

      (1) The most significant caveat of interactome experiments inherently comes from the method of choice. It is possible that by using the co-purification approach of DNAJC7 IP the resulting pool of binding partners is depleted in proteins that interact with DNAJC7 weakly or transiently. An alternative approach presumably more sensitive towards weaker binders could use the TurboID-based proximity-labeling method.

      The reviewer raises a valid point that TurboID-based proximity biotinylation could be a more sensitive approach for identifying DNAJC7 protein-protein interactions compared to IP-MS. We agree that this strategy could be better suited to detect weak or transient interactions, and we have previously used it to characterize protein nanoenvironments and interactomes in vitro and in vivo (Wang et al. Mol Psychiatry 2024, Quan et al. mBio 2024). However, proximity biotinylation also has significant limitations, such as potential artifacts due to overexpression and high background levels. We selected the IP-MS approach to identify DNAJC7 binding partners in neurons without the need of genetically modifying or over-expressing DNAJC7.

      (2) The authors mention in Results (and Figure 2D) that HNRNPA1 was identified as DNAJC7-interacting protein in their co-IP experiments, however, an identifier for this protein cannot be found in Figure 1C and Table S1 listing the proteomics results. Could the authors appropriately update Figure 1C and Table S1, or if HNRNPA1 wasn't really a hit then remove it from listed HNRNPs?

      We apologize for the confusion. HNRNPA1 was pulled down exclusively with DNAJC7 in 2/3 independent experiments and was initially included in our list of targets. However, in our final and most stringent analysis we only considered proteins that appeared in 3/3 experiments and thus HNRNPA1 was filtered out of Figure 1C and Table S1. We will therefore remove it from Figure 2D in the revised manuscript.

      (3) No further validation of DNAJC7-interacting proteins from the heat-shock protein (HSP) family. Current validation of mass spectrometry-identified proteins comes from IP-western blots with antibodies against HSPs. It would be interesting to further inspect possible interactions of these proteins by inspecting co-localization with immunocytochemistry.

      As the reviewer points out we did in fact validate the interaction of DNAJC7 with HSP90 and HSP70 (HSP90AB1 and HSPA1A) by IP-WB as shown in Fig 1F. We agree that examining co-localization of these proteins by immunocytochemistry (ICC) would be important to investigate. However, we have been unable to do this due to technical limitations. Specifically, we have tried to perform ICC using 6 commercially available DNAJC7 antibodies and have so far been unsuccessful. In our hands the DNAJC7 ICC signal appears to be non-specific as it is not reduced when using DNAJC7 knockout and knockdown cells as controls.

      (4) Similarly, the observation of DNAJC7 haploinsufficiency causing an increase in HNRNPU insolubility could be also easily further confirmed by checking for the emergence of "puncta" under a fluorescence microscope, in addition to provided WB experiments from MN lysates.

      This is a good suggestion, and we can assess the emergence of HNRNPU "puncta" by ICC in DNAJC7 mutant iPSC-derived neurons and/or postmortem sporadic ALS patient tissue.

      (5) I would like to recommend the authors to also provide with this manuscript a complete dataset (possibly in the form of a table, presented similarly as Table S1) resulting from experiments presented in Figures 2F and S2D. The information on upregulated and downregulated targets in their DNAJC7 haploinsufficiency model would be a valuable resource for the field and enable further investigations.

      This is a good suggestion and in the revised version we will provide in Table S2 the dataset presented in Figs. 2F and S2D.

      Reviewer #2 (Public review):

      Summary:

      The manuscript titled "The ALS-associated co-chaperone DNAJC7 mediates neuroprotection against proteotoxic stress by modulating HSF1 activity" describes experiments carried out in iPS cells re-differentiated into motor neurons (iNeuons, MNs) seeking to assess the functions of the J protein DnaJC7 in proteostasis. This study also investigates how an ALS-associated mutant variant (R156X) alters DnaJC7 function. The proteomic studies identify proteins interacting with DnaJC7. Using mRNA profiling in haplo-insufficient cells (+/R156X) compared to wild-type cells, the study seeks to identify pathways modulated by partial loss of DnaJC7 function. Studies in the DnaJC7 haplo-insufficient cells also indicate changes in the properties of ALS-associated proteins, such as HNRNPU and Matrin3 both of which are involved in the regulation of gene expression. The study also shows data indicating that DnaJC7 haploinsufficiency sensitizes cells to proteostatic stress induced by proteosome inhibition by MG132 and Hsp90 inhibition by Ganetespib. Lastly, the study investigates how DnaJC7 modulates the activity of the heat shock transcription factor (Hsf1) and thus the heat shock response.

      Strengths<br /> (1) The manuscript is well presented and most of the data is of high quality and convincing. The figures and supplementary figures are clear and easy to follow.

      (2) This study overall provides important new insights into a mostly underexplored molecular co-chaperone and its role in proteostasis. The proteomic and transcriptomic experiments certainly advance our understanding of DnaJC7. The MN model is well-suited for these studies addressing the role of DnaJC7, particularly regarding ALS. The haplo-insufficient MNs are also a suitable model to study a potential loss of function mechanism caused by (some) fALS-associated mutants in ALS, such as the R156X mutation used here.

      (3) Since so little is known about DnaJC7 function, the exploratory approaches applied here are particularly useful.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses

      (1) Without follow-up studies, however, e.g., with select interacting proteins, the study provides merely a descriptive list of possible interactions without mechanistic insights. Also, most interactions have not been extensively (only a few examples) validated by other methods or individual experiments.

      We appreciate the reviewers concern and agree that there are several intriguing DNAJC7 interactors worth studying further, that is why we wanted to share this resource with the broader community as quickly as possible. As the first study focused on DNAJC7 and its link to ALS we could not possibly investigate multiple potential interactors and focused on two: HNRNPU and HSP70/HSP90, associated with RNA metabolism and stress response respectively, as these are two pathways have previously been implicated in ALS pathogenesis. We do provide validation of these interactions and some mechanistic insight into how DNAJC7 haploinsufficiency impairs their function.

      A major limitation of the study in its current form is that none of the experimental approaches allow for assessing the specific functions of JC7. In the absence of specificity controls, e.g., other J proteins or HOP, which, like DnaJC7, contains TPR domains and can interact with Hsp70 and Hsp90, it remains unclear if the proposed functions of DnaJC7 are specific/unique or shared by other J proteins or molecular chaperones. Accordingly, it would be highly informative to add experiments to assess if some of the reported DnaJC7 protein-protein interactions and the transcriptional alterations in haplo-insufficient cells are DnaJC7specific or also occur with other J proteins or molecular chaperones. This seems particularly important to discern specific DnaJC7 functions from general effects caused by impaired proteostasis.

      We agree with the reviewer that is a very interesting question, as for example mutations in DNAJC6 can cause rare forms of Parkinson’s Disease1. However, addressing the functional overlap of DNAJC7 with other J proteins such as DNAJC6 would require substantial time and resources and is out of scope of the current manuscript. 

      It would be informative to explore how cellular stress (e.g., MG132 treatment) alters DnaJC7 interactions with other proteins (J proteins, HOP), ideally in additional/comparative proteomic studies. The mechanism underlying the proposed regulation of Hsf1 by DnaJC7 is not quite clear to me (Figures 4 A-I). There is no evidence of a direct physical interaction between DnJC7 and Hsf1 in the proteomic data or elsewhere. It seems plausible that Hsf1/HSR dysregulation in the haplo-insufficient cells might be due to rather indirect effects, e.g., increased protein misfolding. Also, additional data showing differential activation of Hsf1 in +/+ versus +/- cells would strengthen this part, e.g. showing differences in Hsf1 trimerization, Hsp70 interactions, nuclear localization, etc.

      The reviewer makes two good points here. Firstly, we do agree we should provide additional data to better understand the differential activation of HSF1 in DNACJ7 heterozygous neurons and we will focus on this question during the revision. We also agree that the mechanism underlying the regulation of HSF1 by DNAJC7 is not well defined and we acknowledge it could be indirect. Of note, HSF1 activation is regulated by HSP70, of which DNAJC7 is a co-chaperone. We will attempt to define this mechanism better during the revision.

      The manuscript might also benefit from considering the literature showing an unusually inactive HSR and Hsf1 activity in motor neurons (e.g. published by the Durham lab).

      Yes—we did in fact note this in our discussion: “At the same time, mouse MNs have previously been shown to maintain a high threshold of induction of the HSF1-mediated stress response relative to other cell types including glial cells, with the suggestion that this contributes to their vulnerability to stress signals such as insoluble proteins.” We will further consider how our findings are in line with those of Durham et al., in the revised discussion.

      The correlation with transcriptomic data from ALS patients compared to neurotypical controls (Figures 4 L, M) suggesting a direct role of Hsf1/HSR seems unlikely at this point. In my view, the transcriptional dysregulation in ALS patients could be unrelated to Hsf1 dysregulation and caused by rather non-specific effects of neuronal decay in ALS.

      This is a very reasonable concern.  We acknowledge that the HSF1 effects in patients could be driven by multiple other factors including C9-DPRs etc. However, the point of this analysis is not to claim that DNAJC7 is the cause; but rather to highlight the importance of the HSF1 pathway, which we identified as being mis-regulated in DNAJC7 mutant neurons, as broadly relevant in sporadic and other forms of genetic ALS. 

      Reviewer #3 (Public review):

      Summary:

      Fleming et al sought to better understand DNAJC7's function in motor neurons as mutations in this gene have been associated with amyotrophic lateral sclerosis (ALS). The research question is relevant and important. The authors use an induced pluripotent stem cell (iPSC) line to derive motor neurons (iMNs) finding that DNAJC7 interacts with RNA-binding proteins (RBP) in wild-type cells and a truncated mutant DNAJC7[R156*] disrupts the RBP, hnRNPU, by promoting its accumulation into insoluble fractions. Given that DNAJC7 is predicted to regulate stress responses, the authors then find that DNAJC7[R156*] expression sensitizes the iMNs to proteosomal stress by disrupting the expression of the key heat stress response regulator, HSF1. These findings support that loss-of-function mutations in DNAJC7 will indeed sensitize motor neurons to proteotoxic stress, potentially driving ALS. The association with RBPs, which routinely are found to be disrupted in ALS, is of interest and warrants further study.

      Strengths

      (1) The research question is relevant and important. The authors provide interesting data that DNAJC7 mutations impact two important features in ALS, the dysregulation of RNA binding proteins and the sensitivity of motor neurons to proteotoxic stress.

      (2) The authors provide solid data to support their findings and the assays are appropriate.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses

      (1) The authors rely on a single iPSC line throughout the text, using the same line to make the mutation-carrying cells. iPSCs are highly variable and at minimum 3 lines, typically 5 lines, should be used to define consistent findings. This work would be greatly strengthened if 3 or more lines were used to confirm consistent effects. This is particularly concerning given that iPSCs were differentiated using growth factors versus genetic induction. Growth-factor-based differentiations are more variable.

      We will substantiate the major findings by the use of additional models and genetic backgrounds during the revision. However, our experiments utilize isogenic controls and extensive quality control assays (on-target, off target analysis, whole genome sequencing, karyotype etc.) to ensure that our isogenic lines are genomically identical --other than the DNAJC7 mutation-- and thus any phenotypes are likely caused by mutant DNAJC7 itself.   

      (2) The authors argue that HSF1 and its targets are downregulated in sporadic ALS and mutant C9orf72 ALS. The first concern is that these transcriptomics data were derived from cortical tissue which does not contain motor neurons (Pineda et al. 2024 Cell 187: 1971-1989.e1916). The second concern is that the inclusion of C9orf72 mutant tissue is not well justified as (1) this mutation is associated with an upregulation of HSF1 and its targets in patients (Mordes et al, Acta Neuropathol Commun 2018 6(1):55; Lee et al Neuron 2023 111(9):1381-1390) and (2) the C9orf72 mutation is associated with a ALS/FTD spectrum disorder defined by TDP-43 pathology. Disease mechanisms associated with this spectrum disorder may not overlap with traditional ALS which is typically defined by SOD1 pathology.

      SOD1 pathology represents only a small fraction (<2%) of all ALS patients and is therefore not traditional ALS. The majority (<97%) of sporadic and familial ALS cases (including C9orf72 but excluding SOD1 and FUS cases) are uniformly characterized by TDP-43 pathology. Nevertheless, we do agree that it would be better to assess spinal cord data but unfortunately such single cell datasets form ALS patients do not currently exist. We acknowledge that the HSF1 effects in patients could be driven by multiple other factors including C9-DPRs etc. However, the point of this analysis is not to claim that DNAJC7 is the cause; but rather to highlight the importance of the HSF1 pathway, which we identified as being mis-regulated in DNAJC7 mutant neuron, as being broadly relevant in sporadic and other forms of genetic ALS. 

      (3) As a whole, the findings are mechanistically disjointed, and additional experiments or discussion would help to connect the dots a bit more.

      We will revise the manuscript with additional experiments and discussion to better connect the dots.

      Citations

      (1) Kurian, M. A. & Abela, L. in GeneReviews(®)   (eds M. P. Adam et al.)  (University of Washington, Seattle Copyright © 1993-2025, University of Washington, Seattle. GeneReviews is a registered trademark of the University of Washington, Seattle. All rights reserved., 1993).

    1. eLife assessment

      The authors made a useful finding that Zizyphi spinosi semen, a traditional Chinese medicine, has demonstrated excellent biological activity and potential therapeutic effects against Alzheimer's disease (AD). The researchers presented the effects, but the research evidence for the mechanism was incomplete. The main claims were only partially supported.

    2. Reviewer #1 (Public review):

      Summary:

      The study shows that Zizyphi spinosi semen (ZSS), particularly its non-extracted simple crush powder, has significant therapeutic effects on neurodegenerative diseases. It removes Aβ, tau, and α-synuclein oligomers, restores synaptophysin levels, enhances BDNF expression and neurogenesis, and improves cognitive and motor functions in mouse AD, FTD, DLB, and PD models. Additionally, ZSS powder reduces DNA oxidation and cellular senescence in normal-aged mice, increases synaptophysin, BDNF, and neurogenesis, and enhances cognition to levels comparable to young mice.

      Weaknesses:

      (1) While the study demonstrates that ZSS has protective effects across a wide range of animal models, including AD, FTD, DLB, PD, and both young and aged mice, it is broad and lacks a detailed investigation into the underlying mechanisms. This is the most significant concern.

      (2) The authors highlight that the non-extracted simple crush powder of ZSS shows more substantial effects than its hot water extract and extraction residue. However, the manuscript provides very limited data comparing the effects of these three extracts.

      (3) The authors have not provided a rationale for the dosing concentrations used, nor have they tested the effects of the treatment in normal mice to verify its impact under physiological conditions.

      (4) Regarding the assessment of cognitive function in mice, the authors only utilized the Morris Water Maze (MWM) test, which includes a five-day spatial learning training phase followed by a probe trial. The authors focused solely on the learning phase. However, it is relevant to note that data from the learning phase primarily reflects the learning ability of the mice, while the probe trial is more indicative of memory. Therefore, it is essential that probe trial data be included for a more comprehensive analysis. A justification should be included to explain why the latency of 1st is about 50s not 60s.

      (5) The BDNF immunohistochemical staining in the manuscript appears to be non-specific.

      (6) The central pathological regions in PD are the substantia nigra and striatum. Please replace the staining results from the cortex and hippocampus with those from these regions in the PD model.

    3. Reviewer #2 (Public review):

      Summary:

      The authors studied the effects of hot water extract, extraction residue, and non-extracted simple crush powder of ZSS in diseased or aged mice. It was found that ZSS played an anti-neurodegenerative role by removing toxic proteins, repairing damaged neurons, and inhibiting cell senescence.

      Strengths:

      The authors studied the effects of ZSS in different transgenic mice and analyzed the different states of ZSS and the effects of different components.

      Weaknesses:

      The authors' study lacked an in-depth exploration of mechanisms, including changes in intracellular signal transduction, drug targets, and drug toxicity detection.

    4. Reviewer #3 (Public review):

      ZSS has been widely used in Traditional Chinese Medicine as a sleep-promoting herb. This study tests the effects of ZSS powder and extracts on AD, PD, and aging, and broad protective effects were revealed in mice.

      However, this work did not include a mechanistic study or target data on ZSS were included, and PK data were also not involved. Mechanisms or targets and PK study are suggested. A human PK study is preferred over mice or rats. E.g. which main active ingredients and the concentration in plasma, in this context, to study the pharmacological mechanisms of ZSS.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) While the study demonstrates that ZSS has protective effects across a wide range of animal models, including AD, FTD, DLB, PD, and both young and aged mice, it is broad and lacks a detailed investigation into the underlying mechanisms. This is the most significant concern.

      We appreciate this comment. We recognize that elucidating the mechanism is an important research topic, and we are currently working on it. The purpose of publishing this paper at this time is to inform the public as soon as possible about natural materials and methods that may be effective in preventing dementia and neurodegenerative diseases, and to encourage similar research.

      (2) The authors highlight that the non-extracted simple crush powder of ZSS shows more substantial effects than its hot water extract and extraction residue. However, the manuscript provides very limited data comparing the effects of these three extracts.

      Certainly, it would be better to compare them in several different models, but we believe that important results have already been obtained in tau Tg mice, and comparative data in other models are just additive and confirmatory.

      (3) The authors have not provided a rationale for the dosing concentrations used, nor have they tested the effects of the treatment in normal mice to verify its impact under physiological conditions.

      As described in the Materials and Methods section, the dosage was determined based on the results of preliminary experiments. The beneficial effects in normal mice are shown in Figure 5.

      (4) Regarding the assessment of cognitive function in mice, the authors only utilized the Morris Water Maze (MWM) test, which includes a five-day spatial learning training phase followed by a probe trial. The authors focused solely on the learning phase. However, it is relevant to note that data from the learning phase primarily reflects the learning ability of the mice, while the probe trial is more indicative of memory. Therefore, it is essential that probe trial data be included for a more comprehensive analysis. A justification should be included to explain why the latency of 1st is about 50s not 60s.

      We agree that it is better to include the results of the probe test. We did not include them this time, but we would like to include them in the future. In the memory acquisition training, five trials were performed per day. Since the mice learned the location of the platform during the first five trials, the latency on the first day became around 50 seconds.

      (5) The BDNF immunohistochemical staining in the manuscript appears to be non-specific.

      We cannot understand the basis for saying it is non-specific.

      (6) The central pathological regions in PD are the substantia nigra and striatum. Please replace the staining results from the cortex and hippocampus with those from these regions in the PD model.

      We examined the substantia nigra and found that synuclein pathology appeared in Tg mice and was suppressed by ZSS administration. However, because we did not investigate the striatum, we decided not to show the results for the nigrostriatal system this time. Instead, we thought that we could demonstrate the inhibitory effect of ZSS on synuclein pathology by showing the results for the cortex and hippocampus, which showed early functional decline in these mice (Fig. 4E).

      Reviewer #2 (Public review):

      The authors' study lacked an in-depth exploration of mechanisms, including changes in intracellular signal transduction, drug targets, and drug toxicity detection.

      We appreciate this comment. We understand that the mechanism, targets, and toxicity are important issues to be considered in the future.

      Reviewer #3 (Public review):

      However, this work did not include a mechanistic study or target data on ZSS were included, and PK data were also not involved. Mechanisms or targets and PK study are suggested. A human PK study is preferred over mice or rats. E.g. which main active ingredients and the concentration in plasma, in this context, to study the pharmacological mechanisms of ZSS.

      We appreciate this comment. We understand that the mechanism and target are important issues to consider in the future. As the reviewer pointed out, to conduct PK studies, we must first identify the active ingredients. Unfortunately, we have not been able to identify them yet.

      Reviewer #2 (Recommendations for the authors):

      The authors have proved that ZSS has neuroprotective effects through rigorous animal experiments. However, ZSS contains other active substances besides jujuboside A, jujuboside B, and spinosin, which is more concerning. More critical data may be obtained if experiments have been designed to search for active substances.

      We appreciate this suggestion. We recognize that identifying the true active ingredients is a very important issue. Future studies will be designed to identify them and elucidate their mechanism of action.

    1. eLife Assessment

      This useful study presents a possible solution for a significant problem - that of draining vein sensitivity in functional MRI, which complicates the interpretability of laminar-fMRI results. The addition of a low diffusion-weighted gradient is presented to remove the draining vein signal and obtain functional responses with higher spatial fidelity. However, the strength of the evidence is incomplete, and most tests appear to have been done only in a single subject. Significance thresholds in presented maps are very low and most cortical depth-dependent response profiles do not differ from baseline, even in the BOLD data shown as reference. Curiously, even BOLD group data fails to replicate the well-known pattern of draining towards the cortical surface.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to provide imaging methods for users of the field of human layer-fMRI. This is an emerging field with 240 papers published so far. Different than implied in the manuscript, 3T is well represented among those papers. E.g. see the papers below that are not cited in the manuscript. Thus, the claim on the impact of developing 3T methodology for wider dissemination is not justified. Specifically, because some of the previous papers perform whole brain layer-fMRI (also at 3T) in more efficient, and more established procedures.

      The authors implemented a sequence with lots of nice features. Including their own SMS EPI, diffusion bipolar pulses, eye-saturation bands, and they built their own reconstruction around it. This is not trivial. Only a few labs around the world have this level of engineering expertise. I applaud this technical achievement. However, I doubt that any of this is the right tool for layer-fMRI, nor does it represent an advancement for the field. In the thermal noise dominated regime of sub-millimeter fMRI (especially at 3T) it is established to use 3D readouts over 2D (SMS) readouts. While it is not trivial to implement SMS, the vendor implementations (as well as the CMRR and MGH implementations) are most widely applied across the majority of current fMRI studies already. The author's work on this does not serve any previous shortcomings in the field.

      The mechanism to use bi-polar gradients to increase the localization specificity is doubtful to me. In my understanding, killing the intra-vascular BOLD should make it less specific. Also, the empirical data do not suggest a higher localization specificity to me.

      Embedding this work in the literature of previous methods is incomplete. Recent trends of vessel signal manipulation with ABC or VAPER are not mentioned. Comparisons with VASO are outdated and incorrect.

      The reproducibility of the methods and the result is doubtful (see below).

      I don't think that this manuscript is in the top 50% of the 240 layer-fmri papers out there.

      3T layer-fMRI papers that are not cited:

      Taso, M., Munsch, F., Zhao, L., Alsop, D.C., 2021. Regional and depth-dependence of cortical blood-flow assessed with high-resolution Arterial Spin Labeling (ASL). Journal of Cerebral Blood Flow and Metabolism. https://doi.org/10.1177/0271678X20982382

      Wu, P.Y., Chu, Y.H., Lin, J.F.L., Kuo, W.J., Lin, F.H., 2018. Feature-dependent intrinsic functional connectivity across cortical depths in the human auditory cortex. Scientific Reports 8, 1-14. https://doi.org/10.1038/s41598-018-31292-x

      Lifshits, S., Tomer, O., Shamir, I., Barazany, D., Tsarfaty, G., Rosset, S., Assaf, Y., 2018. Resolution considerations in imaging of the cortical layers. NeuroImage 164, 112-120. https://doi.org/10.1016/j.neuroimage.2017.02.086

      Puckett, A.M., Aquino, K.M., Robinson, P.A., Breakspear, M., Schira, M.M., 2016. The spatiotemporal hemodynamic response function for depth-dependent functional imaging of human cortex. NeuroImage 139, 240-248. https://doi.org/10.1016/j.neuroimage.2016.06.019

      Olman, C.A., Inati, S., Heeger, D.J., 2007. The effect of large veins on spatial localization with GE BOLD at 3 T: Displacement, not blurring. NeuroImage 34, 1126-1135. https://doi.org/10.1016/j.neuroimage.2006.08.045

      Ress, D., Glover, G.H., Liu, J., Wandell, B., 2007. Laminar profiles of functional activity in the human brain. NeuroImage 34, 74-84. https://doi.org/10.1016/j.neuroimage.2006.08.020

      Huber, L., Kronbichler, L., Stirnberg, R., Ehses, P., Stocker, T., Fernández-Cabello, S., Poser, B.A., Kronbichler, M., 2023. Evaluating the capabilities and challenges of layer-fMRI VASO at 3T. Aperture Neuro 3. https://doi.org/10.52294/001c.85117

      Scheeringa, R., Bonnefond, M., van Mourik, T., Jensen, O., Norris, D.G., Koopmans, P.J., 2022. Relating neural oscillations to laminar fMRI connectivity in visual cortex. Cerebral Cortex. https://doi.org/10.1093/cercor/bhac154

      Strengths:

      See above. The authors developed their own SMS sequence with many features. This is important to the field. And does not leave sequence development work to view isolated monopoly labs. This work democratises SMS.<br /> The questions addressed here are of high relevance to the field: getting tools with good sensitivity, user-friendly applicability, and locally specific brain activity mapping is an important topic in the field of layer-fMRI.

      Weaknesses:

      (1) I feel the authors need to justify why flow-crushing helps localization specificity. There is an entire family of recent papers that aims to achieve higher localization specificity by doing the exact opposite. Namely, MT or ABC fRMRI aims to increase the localization specificity by highlighting the intravascular BOLD by means of suppressing non-flowing tissue. To name a few:

      Priovoulos, N., de Oliveira, I.A.F., Poser, B.A., Norris, D.G., van der Zwaag, W., 2023. Combining arterial blood contrast with BOLD increases fMRI intracortical contrast. Human Brain Mapping hbm.26227. https://doi.org/10.1002/hbm.26227.

      Pfaffenrot, V., Koopmans, P.J., 2022. Magnetization Transfer weighted laminar fMRI with multi-echo FLASH. NeuroImage 119725. https://doi.org/10.1016/j.neuroimage.2022.119725

      Schulz, J., Fazal, Z., Metere, R., Marques, J.P., Norris, D.G., 2020. Arterial blood contrast ( ABC ) enabled by magnetization transfer ( MT ): a novel MRI technique for enhancing the measurement of brain activation changes. bioRxiv. https://doi.org/10.1101/2020.05.20.106666

      Based on this literature, it seems that the proposed method will make the vein problem worse, not better. The authors could make it clearer how they reason that making GE-BOLD signals more extra-vascular weighted should help to reduce large vein effects.

      The empirical evidence for the claim that flow crushing helps with the localization specificity should be made clearer. The response magnitude with and without flow crushing looks pretty much identical to me (see Fig, 6d).<br /> It's unclear to me what to look for in Fig. 5. I cannot discern any layer patterns in these maps. It's too noisy. The two maps of TE=43ms look like identical copies from each other. Maybe an editorial error?

      The authors discuss bipolar crushing with respect to SE-BOLD where it has been previously applied. For SE-BOLD at UHF, a substantial portion of the vein signal comes from the intravascular compartment. So I agree that for SE-BOLD, it makes sense to crush the intravascular signal. For GE-BOLD however, this reasoning does not hold. For GE-BOLD (even at 3T), most of the vein signal comes from extravascular dephasing around large unspecific veins and the bipolar crushing is not expected to help with this.

      (2) The bipolar crushing is limited to one single direction of flow. This introduces a lot of artificial variance across the cortical folding pattern. This is not mentioned in the manuscript. There is an entire family of papers that perform layer-fmri with black-blood imaging that solves this with a 3D contrast preparation (VAPER) that is applied across a longer time period, thus killing the blood signal while it flows across all directions of the vascular tree. Here, the signal cruising is happening with a 2D readout as a "snap-shot" crushing. This does not allow the blood to flow in multiple directions.<br /> VAPER also accounts for BOLD contaminations of larger draining veins by means of a tag-control sampling. The proposed approach here does not account for this contamination.

      Chai, Y., Li, L., Huber, L., Poser, B.A., Bandettini, P.A., 2020. Integrated VASO and perfusion contrast: A new tool for laminar functional MRI. NeuroImage 207, 116358. https://doi.org/10.1016/j.neuroimage.2019.116358

      Chai, Y., Liu, T.T., Marrett, S., Li, L., Khojandi, A., Handwerker, D.A., Alink, A., Muckli, L., Bandettini, P.A., 2021. Topographical and laminar distribution of audiovisual processing within human planum temporale. Progress in Neurobiology 102121. https://doi.org/10.1016/j.pneurobio.2021.102121

      If I would recommend anyone to perform layer-fMRI with blood crushing, it seems that VAPER is the superior approach. The authors could make it clearer why users might want to use the unidirectional crushing instead.

      (3) The comparison with VASO is misleading.<br /> The authors claim that previous VASO approaches were limited by TRs of 8.2s. The authors might be advised to check the latest literature of the last years.<br /> Koiso et al. has performed whole brain layer-fMRI VASO at 0.8mm at 3.9 seconds (with reliable activation) and 2.7 seconds (with unconvincing activation pattern, though), and 2.3 (without activation).<br /> Also, whole brain layer-fMRI BOLD at 0.5mm and 0.7mm has been previously performed by the Juelich group at TRs of 3.5s (their TR definition is 'fishy' though).

      Koiso, K., Müller, A.K., Akamatsu, K., Dresbach, S., Gulban, O.F., Goebel, R., Miyawaki, Y., Poser, B.A., Huber, L., 2023. Acquisition and processing methods of whole-brain layer-fMRI VASO and BOLD: The Kenshu dataset. Aperture Neuro 34. https://doi.org/10.1101/2022.08.19.504502

      Yun, S.D., Pais‐Roldán, P., Palomero‐Gallagher, N., Shah, N.J., 2022. Mapping of whole‐cerebrum resting‐state networks using ultra‐high resolution acquisition protocols. Human Brain Mapping. https://doi.org/10.1002/hbm.25855

      Pais-Roldan, P., Yun, S.D., Palomero-Gallagher, N., Shah, N.J., 2023. Cortical depth-dependent human fMRI of resting-state networks using EPIK. Front. Neurosci. 17, 1151544. https://doi.org/10.3389/fnins.2023.1151544

      The authors are correct that VASO is not advised as a turn-key method for lower brain areas, incl. Hippocampus and subcortex. However, the authors use this word of caution that is intended for inexperienced "users" as a statement that this cannot be performed. This statement is taken out of context. This statement is not from the academic literature. It's advice for the 40+ user base that want to perform layer-fMRI as a plug-and-play routine tool in neuroscience usage. In fact, sub-millimeter VASO is routinely being performed by MRI-physicists across all brain areas (including deep brain structures, hippocampus etc). E.g. see Koiso et al. and an overview lecture from a layer-fMRI workshop that I had recently attended: https://youtu.be/kzh-nWXd54s?si=hoIJjLLIxFUJ4g20&t=2401

      Thus, the authors could embed this phrasing into the context of their own method that they are proposing in the manuscript. E.g. the authors could state whether they think that their sequence has the potential to be disseminated across sites, considering that it requires slow offline reconstruction in Matlab?<br /> Do the authors think that the results shown in Fig. 6c are suggesting turn-key acquisition of a routine mapping tool? In my humble opinion it looks like random noise, with most of the activation outside the ROI (in white matter).

      (4) The repeatability of the results is questionable.<br /> The authors perform experiments about the robustness of the method (line 620). The corresponding results are not suggesting any robustness to me. In fact the layer profiles in Fig. 4c vs. Fig 4d are completely opposite. Location of peaks turn into locations of dips and vice versa.<br /> The methods are not described in enough detail to reproduce these results.<br /> The authors mention that their image reconstruction is done "using in-house MATLAB code" (line 634). They do not post a link to github, nor do they say if they share this code.

      It is not trivial to get good phase data for fMRI. The authors do not mention how they perform the respective coil-combination.<br /> No data are shared for reproduction of the analysis.

      (5) The application of NODRIC is not validated.<br /> Previous applications of NORDIC at 3T layer-fMRI have resulted in mixed success. When not adjusted for the right SNR regime it can result in artifactual reductions of beta scores, depending on the SNR across layers. The authors could validate their application of NORDIC and confirm that the average layer-profiles are unaffected by the application of NORDIC. Also, the NORDIC version should be explicitly mentioned in the manuscript.

      Akbari, A., Gati, J.S., Zeman, P., Liem, B., Menon, R.S., 2023. Layer Dependence of Monocular and Binocular Responses in Human Ocular Dominance Columns at 7T using VASO and BOLD (preprint). Neuroscience. https://doi.org/10.1101/2023.04.06.535924

      Knudsen, L., Guo, F., Huang, J., Blicher, J.U., Lund, T.E., Zhou, Y., Zhang, P., Yang, Y., 2023. The laminar pattern of proprioceptive activation in human primary motor cortex. bioRxiv. https://doi.org/10.1101/2023.10.29.564658

      Comments on revisions:

      Among all the concerns mentioned above, I think there is only one of the specific issues that was sufficiently addressed.<br /> The authors implemented a combination of three consecutive-dimensional flow crushers. Other concerns were not sufficiently addressed to change my confidence level of the study.<br /> - While the abstract is still focusing on the utility of using 3T, they do not give credit to early 3T layer-fMRI papers leading the way to larger coverage and connectivity applications.<br /> - While the author's choice of using custom SMS 2D readout is justified for them. I do not think that this very method will utilize widespread 3T whole brain connectivity experiments across the global 3T community. This lowers the impact of the paper.<br /> - The images in Fig. 5 are still suspiciously similar. To the level that the noise pattern outside the brain is identical across large parts of the maps with and without PR.<br /> - Maybe it's my ignorance, but I still do not agree why flow crushing focuses the local BOLD responses to small vessels.<br /> - While my feel of a misleading representation of the literature had been accompanied by explicit references, the authors claim that they cannot find them?!? Or claim that they are about something else (which they are not, in my viewpoint).<br /> Data and software are still not shared (not even example data, or nii data).

    3. Reviewer #2 (Public review):

      This study developed a setup for laminar fMRI at 3T that aimed to get the best from all worlds in terms of brain coverage, temporal resolution, sensitivity to detect functional responses and spatial specificity. They used a gradient-echo EPI readout to facilitate sensitivity, brain coverage and temporal resolution. The former was additionally boosted by NORDIC denoising and the latter two were further supported by acceleration both in-plane and across slices. The authors evaluated whether the implementation of velocity-nulling (VN) gradients could mitigate macrovascular bias, known to hamper laminar specificity of gradient-echo BOLD.

      Strengths:

      The setup includes 0.9 mm isotropic acquisitions with large coverage at a reasonable TR. These parameters are hard to optimize simultaneously, and I applaud the ambitious attempt to get "the best from all worlds" (large coverage, high spatio/temporal resolution, spatial specificity, sensitivity), which is sought after in the field. Also, in terms of the availability of the method, it is favorable that it benefits from lower field strength (additional time for VN-gradient implementation, afforded by longer gray matter T2*). Furthermore, I like that the authors took steps to improve the original manuscript by e.g., collecting more data, adjusting the VN implementation to include flow-suppression along three rather than a single dimension, and adjusting the ROI-definition procedure to avoid circularity issues.

      That being said, I still find the evidence weak in terms of this sequence achieving high spatial specificity and sensitivity. The results feel oversold and further validation is needed to make a case for the authors' conclusion that "[...] the potential impact of this development is expected to be extensive across various domains of neuroscience research". This is elaborated in the comments below:

      The authors acknowledge that the VN setup in its current form probably does not suppress the impact of most ascending veins (these are also not targeted by phase regression, as most are probably too small to produce sufficiently large phase responses). This seems to limit the theoretical support for the author's claim of reduced inter-layer blurring (e.g. the claim that deep and superficial signals are less coupled with VN gradients than without based on Fig 6-7). This limitation withstanding, the method may still be helpful for limiting laminar dependencies by suppressing pial vein responses (which may carry signal from distant regions and layers that blur into superficial layers if left unsuppressed). Unfortunately, the empirical support of VN gradients suppressing superficial bias seems quite weak and is hard to evaluate. For example, the profiles in Figure 4 does not consistently show clearly less superficial bias when VN gradients are on - this might partly be due to the fact that clear bias was not always present in the profiles even without VN. I suspect this is largely explained by the selection of very small and quite unrepresentative ROIs. The corresponding activation maps appear strongly weighted towards CSF which is not always captured in the profile. I recommend sampling a much larger patch of cortex to more accurately capture the actual underlying bias. In this way, all non-VN profiles should have clear bias which should be clearly suppressed for VN if the method is effective. The authors do evaluate the effect of VN/phase regression based on a large activated region in visual cortex (Fig 5) - why not show laminar profiles from here, which is an obvious way to show the effect on superficial bias? I think such evaluations would be a more direct way of evaluating the methods impact on specificity, and are necessary for subsequent FC evaluations to be convincing.

      The phase regression results are described inconsistently. In the results section, the authors, in my opinion, "correctly" acknowledge that phase regression seemed to have a very minor impact. However, in the discussion section it is described as if phase regression was effective in suppressing macrovascular responses (L 553-558), which the results do not support (especially based on profiles in Fig 4). There is barely any difference with/without phase regression, which may be due to the fact that ordinary least squares regression was chosen over a deming model which accounts for noise on the phase regressor. Although the authors correctly mentioned in their "answers to reviewers" that the required noise-ratio between magnitude and phase data can be hard to estimate, attempts of that has been described in previous phase regression studies which showed much larger effects (see e.g. Stanley et al. 2020, Knudsen et al. 2023).

      I like that the authors put in additional efforts to provide analyses to validate their NORDIC implementation. However, this needs to be done on the VN setup directly, not the "regular BOLD setup" with b=0, since the ability of NORDIC to distinguish signal and noise components depends on CNR which is expected to deviate for these setups. Also, it seems z-scores and confidence intervals were computed based on GLM residuals which may lead to inflated z-values and overly narrow CI's due to reduced degrees of freedom following denoising. The denoised z-maps from Fig 3 indeed look somewhat strange, i.e. seemingly increased false positives (more salt/pepper and a bunch of white matter activation) with very weak hand knob activation. Also, something must be wrong with the CIs on the laminar profiles - they seem extremely narrow despite noise levels obviously being high for highly accelerated 3T submillimeter results extracted from a very small ROI. The authors may consider computing these statistics from variance across trials instead.

      Given that the idea of the setup is to take advantage in terms of sensitivity by using GE-BOLD contrast relative to e.g. SE-EPI or CBV-weighted setups, they need to carefully demonstrate the sensitivity of their setup, which could be limited by high acceleration factors, the VN gradients, low field strength, etc. I like that they now put more emphasis on non-masked activation maps, but further comparison could be made through tSNR maps, raw single-volume images, raw timeseries, CNR based on across-trial variance, etc.

      The major rationale for the setup is to achieve functional connectivity (FC) with brain-wide coverage at laminar resolutions, but it is framed as if this is something that has not been possible in the past with existing setups (statements such as: "Despite advancements in acquisition speed, current CBV/CBF-based fMRI techniques remain inadequate for layer-dependent resting-state fMRI" (L138-140). To me, the functional connectivity results presented here with the VN setup are clearly less convincing than what has been shown with e.g. CBV-weighted acquisitions (e.g. Huber et al. 2021, Chai et al. 2024). The VN setup might also have advantages such as larger coverage as mentioned by the authors, but they fail to balance the comparison by highlighting where previous studies had clear edges. Thus, the impact of the results needs to be down-stated and a more balanced comparison with existing laminar FC studies is warranted. For example, acknowledging that the CBV-weighted studies demonstrate much higher spatial specificity.

      Overall I would recommend a stronger emphasis on validating the claims about the sequence on task-based data for which there is a large body of literature to benchmark against (e.g. laminar fMRI studies in V1 and M1), before going to FC where the base for comparison and reference is much more limited in humans at laminar scales.

    4. Reviewer #3 (Public review):

      Summary:

      The authors are looking for a spatially specific functional brain response to visualise non-invasively with 3T (clinical field strength) MRI. They propose a velocity-nulled weighting to remove signal from draining veins in a submillimeter multiband acquisition.

      Strengths:

      - This manuscript addresses a real need in the cognitive neuroscience community interested in imaging responses in cortical layers in-vivo in humans.<br /> - An additional benefit is the proposed implementation at 3T, a widely available field strength.

      Weaknesses:

      - The comparison in Figure 4 for different b-values shows % signal changes. However, as the baseline signal changes with added diffusion weighting, this is rather uninformative. A plot of t-values against cortical depth would be more insightful.<br /> - Surprisingly, the %-signal change for a b-value of 0 is below 1% for 3/4 participants, even at the cortical surface. This raises some doubts about the task or ROI definition. A finger-tapping task should reliably engage the primary motor cortex, even at 3T, and even in individual participants.<br /> - The double peak patter in the BOLD weighted images in Figure 4 is unexpected given the existing literature on BOLD responses as a function of cortical depth.<br /> - Although I'd like to applaud the authors for their ambition with the connectivity analysis, the low significance threshold used in these maps (z=1,64) leads to concerns about the SNR of the underlying data.

      I remain unconvinced of the conclusion that the developed VN fMRI exhibited layer specificity - the double peak which is taken as a marker of specificity is not absent in the BOLD responses either, and overall BOLD and VN response profiles as a function of cortical depth are quite similar.

    5. Author response:

      The following is the authors’ response to the original reviews.

      General responses:

      The authors sincerely thank all the reviewers for their valuable and constructive comments. We also apologize for the long delay in providing this rebuttal due to logistical and funding challenges. In this revision, we modified the bipolar gradients from one single direction to all three directions. Additionally, in response to the concerns regarding data reliability, we conducted a thorough examination of each step in our data processing pipeline. In the original processing workflow, the projection-onto-convex-set (POCS) method was used for partial Fourier reconstruction. Upon examination, we found that applying the POCS method after parallel image reconstruction significantly altered the signal and resulted in considerable loss of functional feature. Futhermore, the original scan protocol employed a TE of 46 ms, which is notably longer than the typical TE of 33 ms. A prolonged TE can increase the ratio of extravascular to intravascular contributions. Importantly, the impact of TE on the efficacy of phase regression remains unclear, introducing potential confounding effects. To address these issues, we revised the protocol by shortening the TE from 46 ms to 39 ms. This adjustment was achieved by modifying the SMS factor to 3 and the in-plane acceleration rate to 3, thereby minimizing the confounding effects associated with an extended TE.

      Following these changes, we recollected task-based fMRI data (N=4) and resting-state fMRI data (N=14) under the updated protocol. Using the revised dataset, we validated layer-specific functional connectivity (FC) through seed-based analyses. These analyses revealed distinct connectivity patterns in the superficial and deep layers of the primary motor cortex (M1), with statistically significant inter-layer differences. Furthermore, additional analyses with a seed in the primary sensory cortex (S1) corroborated the robustness and reliability of the revised methodology. We also changed the ‘directed’ functional connectivity in the title to ‘layer-specific’ functional connectivity, as drawing conclusions about directionality requires auxiliary evidence beyond the scope of this study.

      We provide detailed responses to the reviewers’ comments below.

      Reviewer #1 (Public Review):

      Summary:

      (1)   This study aims to provide imaging methods for users of the field of human layer-fMRI. This is an emerging field with 240 papers published so far. Different than implied in the manuscript, 3T is well represented among those papers. E.g. see the papers below that are not cited in the manuscript. Thus, the claim on the impact of developing 3T methodology for wider dissemination is not justified. Specifically, because some of the previous papers perform whole brain layer-fMRI (also at 3T) in more efficient, and more established procedures.

      3T layer-fMRI papers that are not cited:

      Taso, M., Munsch, F., Zhao, L., Alsop, D.C., 2021. Regional and depth-dependence of cortical blood-flow assessed with high-resolution Arterial Spin Labeling (ASL). Journal of Cerebral Blood Flow and Metabolism. https://doi.org/10.1177/0271678X20982382

      Wu, P.Y., Chu, Y.H., Lin, J.F.L., Kuo, W.J., Lin, F.H., 2018. Feature-dependent intrinsic functional connectivity across cortical depths in the human auditory cortex. Scientific Reports 8, 1-14. https://doi.org/10.1038/s41598-018-31292-x

      Lifshits, S., Tomer, O., Shamir, I., Barazany, D., Tsarfaty, G., Rosset, S., Assaf, Y., 2018. Resolution considerations in imaging of the cortical layers. NeuroImage 164, 112-120. https://doi.org/10.1016/j.neuroimage.2017.02.086

      Puckett, A.M., Aquino, K.M., Robinson, P.A., Breakspear, M., Schira, M.M., 2016. The spatiotemporal hemodynamic response function for depth-dependent functional imaging of human cortex. NeuroImage 139, 240-248. https://doi.org/10.1016/j.neuroimage.2016.06.019

      Olman, C.A., Inati, S., Heeger, D.J., 2007. The effect of large veins on spatial localization with GE BOLD at 3 T: Displacement, not blurring. NeuroImage 34, 1126-1135. https://doi.org/10.1016/j.neuroimage.2006.08.045

      Ress, D., Glover, G.H., Liu, J., Wandell, B., 2007. Laminar profiles of functional activity in the human brain. NeuroImage 34, 74-84. https://doi.org/10.1016/j.neuroimage.2006.08.020

      Huber, L., Kronbichler, L., Stirnberg, R., Ehses, P., Stocker, T., Fernández-Cabello, S., Poser, B.A., Kronbichler, M., 2023. Evaluating the capabilities and challenges of layer-fMRI VASO at 3T. Aperture Neuro 3. https://doi.org/10.52294/001c.85117

      Scheeringa, R., Bonnefond, M., van Mourik, T., Jensen, O., Norris, D.G., Koopmans, P.J., 2022. Relating neural oscillations to laminar fMRI connectivity in visual cortex. Cerebral Cortex. https://doi.org/10.1093/cercor/bhac154

      We thank the reviewer for listing out 8 papers related to 3T layer-fMRI papers. The primary goal of our work is to develop a methodology for brain-wide, layer-dependent resting-state functional connectivity at 3T. Upon review of the cited papers, we found that:

      (1) One study (Lifshits et al.) was not an fMRI study.

      (2) One study (Olman et al.) was conducted at 7T, not 3T.

      (3) Two studies (Taso et al. and Wu et al.) employed relatively large voxel sizes (1.6 × 2.3 × 5 mm³ and 1.5 mm isotropic, respectively), which limits layer specificity.

      (4) Only one of the listed studies (Huber et al., Aperture Neuro 2023) provides coverage of more than half of the brain.

      While each of these studies offers valuable insights, the VASO study by Huber et al. is the most relevant to our work, given its brain-wide coverage. However, the VASO method employs a relatively long TR (14.137 s), which may not be optimal for resting-state functional connectivity analyses.

      To address these limitations, our proposed method achieves submillimeter resolution, layer specificity, brain-wide coverage, and a significantly shorter TR (<5 s) altogether. We believe this advancement provides a meaningful contribution to the field, enabling broader applicability of layer-fMRI at 3T.

      (2) The authors implemented a sequence with lots of nice features. Including their own SMS EPI, diffusion bipolar pulses, eye-saturation bands, and they built their own reconstruction around it. This is not trivial. Only a few labs around the world have this level of engineering expertise. I applaud this technical achievement. However, I doubt that any of this is the right tool for layer-fMRI, nor does it represent an advancement for the field. In the thermal noise dominated regime of sub-millimeter fMRI (especially at 3T), it is established to use 3D readouts over 2D (SMS) readouts. While it is not trivial to implement SMS, the vendor implementations (as well as the CMRR and MGH implementations) are most widely applied across the majority of current fMRI studies already. The author's work on this does not serve any previous shortcomings in the field.

      We would like to thank the reviewer for their comments and the recognition of the technical efforts in implementing our sequence. We would like to address the points raised:

      (1) We completely agree that in-house implementation of existing techniques does not constitute an advancement for the field. We did not claim otherwise in the manuscript. Our focus was on the development of a method for brain-wide, layer-dependent resting-state functional connectivity at 3T, as mentioned in the response above.

      (2) The reviewer stated that "it is established to use 3D readouts over 2D (SMS) readouts". This is a strong claim, and we believe it requires robust evidence to support it. While it is true that 3D readouts can achieve higher tSNR in certain regions, such as the central brain, as shown in the study by Vizioli et al. (ISMRM 2020 abstract; https://cds.ismrm.org/protected/20MProceedings/PDFfiles/3825.html?utm_source=chatgpt.com ), higher tSNR does not necessarily equate to improved detection power in fMRI studies. For instance, Le Ster et al. (PLOS ONE, 2019; https://doi.org/10.1371/journal.pone.0225286 ). demonstrated that while 3D EPI had higher tSNR in the central brain, SMS EPI produced higher t-scores in activation maps.

      (3) When choosing between SMS EPI and 3D EPI, multiple factors should be taken into account, not just tSNR. For example, SMS EPI and 3D EPI differ in their sensitivity to motion and the complexity of motion correction. The choice between them depends on the specific research goals and practical constraints.

      (4) We are open to different readout strategies, provided they can be demonstrated suitable to the research goals. In this study, we opted for 2D SMS primarily due to logistical considerations. This choice does not preclude the potential use of 3D readouts in the future if they are deemed more appropriate for the project objectives.

      The mechanism to use bi-polar gradients to increase the localization specificity is doubtful to me. In my understanding, killing the intra-vascular BOLD should make it less specific. Also, the empirical data do not suggest a higher localization specificity to me.

      We will elaborate the mechanism and reasoning in the later responses.

      Embedding this work in the literature of previous methods is incomplete. Recent trends of vessel signal manipulation with ABC or VAPER are not mentioned. Comparisons with VASO are outdated and incorrect.

      The reproducibility of the methods and the result is doubtful (see below).

      In this revision, we updated the scan protocol and recollected the imaging data. Detailed explanations and revised results are provided in the later responses.

      I don't think that this manuscript is in the top 50% of the 240 layer-fmri papers out there.

      We respect the reviewer’s personal opinion. However, we can only address scientific comments or critiques.

      Strengths:

      See above. The authors developed their own SMS sequence with many features. This is important to the field. And does not leave sequence development work to view isolated monopoly labs. This work democratises SMS.

      The questions addressed here are of high relevance to the field: getting tools with good sensitivity, user-friendly applicability, and locally specific brain activity mapping is an important topic in the field of layer-fMRI.

      Weaknesses:

      (1) I feel the authors need to justify why flow-crushing helps localization specificity. There is an entire family of recent papers that aim to achieve higher localization specificity by doing the exact opposite. Namely, MT or ABC fRMRI aims to increase the localization specificity by highlighting the intravascular BOLD by means of suppressing non-flowing tissue. To name a few:

      Priovoulos, N., de Oliveira, I.A.F., Poser, B.A., Norris, D.G., van der Zwaag, W., 2023. Combining arterial blood contrast with BOLD increases fMRI intracortical contrast. Human Brain Mapping hbm.26227. https://doi.org/10.1002/hbm.26227.

      Pfaffenrot, V., Koopmans, P.J., 2022. Magnetization Transfer weighted laminar fMRI with multi-echo FLASH. NeuroImage 119725. https://doi.org/10.1016/j.neuroimage.2022.119725

      Schulz, J., Fazal, Z., Metere, R., Marques, J.P., Norris, D.G., 2020. Arterial blood contrast ( ABC ) enabled by magnetization transfer ( MT ): a novel MRI technique for enhancing the measurement of brain activation changes. bioRxiv. https://doi.org/10.1101/2020.05.20.106666

      Based on this literature, it seems that the proposed method will make the vein problem worse, not better. The authors could make it clearer how they reason that making GE-BOLD signals more extra-vascular weighted should help to reduce large vein effects.

      The proposed VN fMRI method employs VN gradients to selectively suppress signals from fast-flowing blood in large vessels. Although this approach may initially appear to diverge from the principles of CBV-based techniques (Chai et al., 2020; Huber et al., 2017a; Pfaffenrot and Koopmans, 2022; Priovoulos et al., 2023), which enhance sensitivity to vascular changes in arterioles, capillaries, and venules while attenuating signals from static tissue and large veins, it aligns with the fundamental objective of all layer-specific fMRI methods. Specifically, these approaches aim to maximize spatial specificity by preserving signals proximal to neural activation sites and minimizing contributions from distal sources, irrespective of whether the signals are intra- or extra-vascular in origin. In the context of intravascular signals, CBV-based methods preferentially enhance sensitivity to functional changes in small vessels (proximal components) while demonstrating reduced sensitivity to functional changes in large vessels (distal components). For extravascular signals, functional changes are a mixture of proximal and distal influences. While tissue oxygenation near neural activation sites represents a proximal contribution, extravascular signal contamination from large pial veins reflects distal effects that are spatially remote from the site of neuronal activity. CBV-based techniques mitigate this challenge by unselectively suppressing signals from static tissues, thereby highlighting contributions from small vessels. In contrast, the VN fMRI method employs a targeted suppression strategy, selectively attenuating signals from large vessels (distal components) while preserving those from small vessels (proximal components). Furthermore, the use of a 3T scanner and the inclusion of phase regression in the VN approach mitigates contamination from large pial veins (distal components) while preserving signals reflecting local tissue oxygenation (proximal components). By integrating these mechanisms, VN fMRI improves spatial specificity, minimizing both intravascular and extravascular contributions that are distal to neuronal activation sites. We have incorporated the responses into Discussion section.

      The empirical evidence for the claim that flow crushing helps with the localization specificity should be made clearer. The response magnitude with and without flow crushing looks pretty much identical to me (see Fig, 6d).

      In the new results in Figure 4, the application of VN gradients attenuated the bias towards pial surface. Consistent with the results in Figure 4, Figure 5 also demonstrated the suppression of macrovascular signal by VN gradients.

      It's unclear to me what to look for in Fig. 5. I cannot discern any layer patterns in these maps. It's too noisy. The two maps of TE=43ms look like identical copies from each other. Maybe an editorial error?

      In this revision, the original Figure 5 has been removed. However, we would like to clarify that the two maps with TE = 43 ms in the original Figure 5 were not identical. This can be observed in the difference map provided in the right panel of the figure.

      The authors discuss bipolar crushing with respect to SE-BOLD where it has been previously applied. For SE-BOLD at UHF, a substantial portion of the vein signal comes from the intravascular compartment. So I agree that for SE-BOLD, it makes sense to crush the intravascular signal. For GE-BOLD however, this reasoning does not hold. For GE-BOLD (even at 3T), most of the vein signal comes from extravascular dephasing around large unspecific veins, and the bipolar crushing is not expected to help with this.

      The reviewer’s statement that "most of the vein signal comes from extravascular dephasing around large unspecific veins" may hold true for 7T. However, at 3T, the susceptibility-induced Larmor frequency shift is reduced by 57%, and the extravascular contribution decreases by more than 35%, as shown by Uludağ et al. 2009 ( DOI: 10.1016/j.neuroimage.2009.05.051 ).

      Additionally, according to the biophysical models (Ogawa et al., 1993; doi: 10.1016/S0006-3495(93)81441-3 ), the extravascular contamination from the pial surface is inversely proportional to the square of the distance from vessel. For a vessel diameter of 0.3 mm and an isotropic voxel size of 0.9 mm, the induced frequency shift is reduced by at least 36-fold at the next voxel. Notably, a vessel diameter of 0.3 mm is larger than most pial vessels. Theoretically, the extravascular effect contributes minimally to inter-layer dependency, particularly at 3T compared to 7T due to weaker susceptibility-related effects at lower field strengths. Empirically, as shown in Figure 7c, the results at M1 demonstrated that layer specificity can be achieved statistically with the application of VN gradients. We have incorporated this explanation into the Introduction and Discussion sections of the manuscript.

      (2) The bipolar crushing is limited to one single direction of flow. This introduces a lot of artificial variance across the cortical folding pattern. This is not mentioned in the manuscript. There is an entire family of papers that perform layer-fmri with black-blood imaging that solves this with a 3D contrast preparation (VAPER) that is applied across a longer time period, thus killing the blood signal while it flows across all directions of the vascular tree. Here, the signal cruising is happening with a 2D readout as a "snap-shot" crushing. This does not allow the blood to flow in multiple directions.

      VAPER also accounts for BOLD contaminations of larger draining veins by means of a tag-control sampling. The proposed approach here does not account for this contamination.

      Chai, Y., Li, L., Huber, L., Poser, B.A., Bandettini, P.A., 2020. Integrated VASO and perfusion contrast: A new tool for laminar functional MRI. NeuroImage 207, 116358. https://doi.org/10.1016/j.neuroimage.2019.116358

      Chai, Y., Liu, T.T., Marrett, S., Li, L., Khojandi, A., Handwerker, D.A., Alink, A., Muckli, L., Bandettini, P.A., 2021. Topographical and laminar distribution of audiovisual processing within human planum temporale. Progress in Neurobiology 102121. https://doi.org/10.1016/j.pneurobio.2021.102121

      If I would recommend anyone to perform layer-fMRI with blood crushing, it seems that VAPER is the superior approach. The authors could make it clearer why users might want to use the unidirectional crushing instead.

      We understand the reviewer’s concern regarding the directional limitation of bipolar crushing. As noted in the responses above, we have updated the bipolar gradient to include three orthogonal directions instead of a single direction. Furthermore, flow-related signal suppression does not necessarily require a longer time period. Bipolar diffusion gradients have been effectively used to nullify signals from fast-flowing blood, as demonstrated by Boxerman et al. (1995; DOI: 10.1002/mrm.1910340103). Their study showed that vessels with flow velocities producing phase changes greater than p radians due to bipolar gradients experience significant signal attenuation. The critical velocity for such attenuation can be calculated using the formula: 1/(2gGDd) where g is the gyromagnetic ratio, G is the gradient strength, d is the gradient pulse width and D is the time between the two bipolar gradient pulses. In the framework of Boxerman et al. at 1.5T, the critical velocity for b value of 10 s/mm<sup>2</sup> is ~8 mm/s, resulting in a ~30% reduction in functional signal. In our 3T study, b values of 6, 7, and 8 s/mm<sup>2</sup> correspond to critical velocities of 16.8, 15.2, and 13.9 mm/s, respectively. The flow velocities in capillaries and most venules remain well below these thresholds. Notably, in our VN fMRI sequences, bipolar gradients were applied in all three orthogonal directions, whereas in Boxerman et al.'s study, the gradients were applied only in the z-direction. Given the voxel dimensions of 3 × 3 × 7 mm<sup>3</sup> in the 1.5T study, vessels within a large voxel are likely oriented in multiple directions, meaning that only a subset of fast-flowing signals would be attenuated. Therefore, our approach is expected to induce greater signal reduction, even at the same b values as those used in Boxerman et al.'s study. We have incorporated this text into the Discussion section of the manuscript.

      (3) The comparison with VASO is misleading.

      The authors claim that previous VASO approaches were limited by TRs of 8.2s. The authors might be advised to check the latest literature of the last years.

      Koiso et al. performed whole brain layer-fMRI VASO at 0.8mm at 3.9 seconds (with reliable activation), 2.7 seconds (with unconvincing activation pattern, though), and 2.3 (without activation).

      Also, whole brain layer-fMRI BOLD at 0.5mm and 0.7mm has been previously performed by the Juelich group at TRs of 3.5s (their TR definition is 'fishy' though).

      Koiso, K., Müller, A.K., Akamatsu, K., Dresbach, S., Gulban, O.F., Goebel, R., Miyawaki, Y., Poser, B.A., Huber, L., 2023. Acquisition and processing methods of whole-brain layer-fMRI VASO and BOLD: The Kenshu dataset. Aperture Neuro 34. https://doi.org/10.1101/2022.08.19.504502

      Yun, S.D., Pais‐Roldán, P., Palomero‐Gallagher, N., Shah, N.J., 2022. Mapping of whole‐cerebrum resting‐state networks using ultra‐high resolution acquisition protocols. Human Brain Mapping. https://doi.org/10.1002/hbm.25855

      Pais-Roldan, P., Yun, S.D., Palomero-Gallagher, N., Shah, N.J., 2023. Cortical depth-dependent human fMRI of resting-state networks using EPIK. Front. Neurosci. 17, 1151544. https://doi.org/10.3389/fnins.2023.1151544

      We thank the reviewer for providing these references. While the protocol with a TR of 3.9 seconds in Koiso’s work demonstrated reasonable activation patterns, it was not tested for layer specificity. Given that higher acceleration factors (AF) can cause spatial blurring, a protocol should only be eligible for comparison if layer specificity is demonstrated.

      Secondly, the TRs reported in Koiso’s study pertain only to either the VASO or BOLD acquisition, not the combined CBV-based contrast. To generate CBV-based images, both VASO and BOLD data are required, effectively doubling the TR. For instance, if the protocol with a TR of 3.9 seconds is used, the effective TR becomes approximately 8 seconds. The stable protocol used by Koiso et al. to acquire whole-brain data (94.08 mm along the z-axis) required 5.2 seconds for VASO and 5.1 seconds for BOLD, resulting in an effective TR of 10.3 seconds. The spatial resolution achieved was 0.84 mm isotropic.

      Unfortunately, we could not find the Juelich paper mentioned by the reviewer.

      To have a more comprehensive comparison, we collated relevant literature on brain-wide layer-specific fMRI. We defined brain-wide acquisition as imaging protocols that cover more than half of the human brain, specifically exceeding 55 mm along the superior-inferior axis. We identified five studies and summarized their scan parameters, including effective TR, coverage, and spatial resolution, in Table 1.

      The authors are correct that VASO is not advised as a turn-key method for lower brain areas, incl. Hippocampus and subcortex. However, the authors use this word of caution that is intended for inexperienced "users" as a statement that this cannot be performed. This statement is taken out of context. This statement is not from the academic literature. It's advice for the 40+ user base that wants to perform layer-fMRI as a plug-and-play routine tool in neuroscience usage. In fact, sub-millimeter VASO is routinely being performed by MRI-physicists across all brain areas (including deep brain structures, hippocampus etc). E.g. see Koiso et al. and an overview lecture from a layer-fMRI workshop that I had recently attended: https://youtu.be/kzh-nWXd54s?si=hoIJjLLIxFUJ4g20&t=2401

      In this revision, we decided to focus on cortico-cortical functional connectivity and have removed the LGN-related content. Consequently, the text mentioned by the reviewer was also removed. Nevertheless, we apologize if our original description gave the impression that functional mapping of deep brain regions using VASO is not feasible. The word of caution we used is based on the layer-fMRI blog ( https://layerfmri.com/2021/02/22/vaso_ve/ ) and reflects the challenges associated with this technique, as outlined by experts like Dr. Huber and Dr. Strinberg.

      According to the information provided, including the video, functional mapping of the hippocampus and amygdala using VASO is indeed possible but remains technically challenging. The short arterial arrival times in these deep brain regions can complicate the acquisition, requiring RF inversion pulses to cover a wider area at the base of the brain. For example, as of 2023, four or more research groups were attempting to implement layer-fMRI VASO in the hippocampus. One such study at 3T required multiple inversion times to account for inflow effects, highlighting the technical complexity of these applications. This is the context in which we used the word of caution. We are not sure whether recent advancements like MAGEC VASO have improved its applicability. As of 2024, we have not identified any published VASO studies specifically targeting deep brain structures such as the hippocampus or amygdala. Therefore, it is difficult to conclude that “sub-millimeter VASO is routinely being performed by MRI physicists on deep brain structures such as the hippocampus.”

      Thus, the authors could embed this phrasing into the context of their own method that they are proposing in the manuscript. E.g. the authors could state whether they think that their sequence has the potential to be disseminated across sites, considering that it requires slow offline reconstruction in Matlab?

      We are enthusiastic about sharing our imaging sequence, provided its usefulness is conclusively established. However, it's important to note that without an online reconstruction capability, such as the ICE, the practical utility of the sequence may be limited. Unfortunately, we currently don’t have the manpower to implement the online reconstruction. Nevertheless, we are more than willing to share the offline reconstruction codes upon request.

      Do the authors think that the results shown in Fig. 6c are suggesting turn-key acquisition of a routine mapping tool? In my humble opinion, it looks like random noise, with most of the activation outside the ROI (in white matter).

      As we mentioned in the ‘general response’ in the beginning of the rebuttal, the POCS method for partial Fourier reconstruction caused the loss of functional feature, potentially accounting for the activation in white matter. In this revision, we have modified the pulse sequence, scan protocol and processing pipelines.

      According to the results in Figure 4, stable activation in M1 was observed at the single-subject level across most scan protocols. Yet, the layer-dependent activation profiles in M1 were spatially unstable, irrespective of the application of VN gradients. This spatial instability is not entirely unexpected, as T2*-based contrast is inherently sensitive to various factors that perturb the magnetic field, such as eye movements, respiration, and macrovascular signal fluctuations. Furthermore, ICA-based artifact removal was intentionally omitted in Figure 4 to ensure fair comparisons between protocols, leaving residual artifacts unaddressed. Inconsistency in performing the button-pressing task across sessions may also have contributed to the observed variability. These results suggest that submillimeter-resolution fMRI may not yet be suitable for reliable individual-level layer-dependent functional mapping, unless group-level statistics are incorporated to enhance robustness. We have incorporated this text into the Limitation section of the manuscript.

      (4) The repeatability of the results is questionable.

      The authors perform experiments about the robustness of the method (line 620). The corresponding results are not suggesting any robustness to me. In fact, the layer profiles in Fig. 4c vs. Fig 4d are completely opposite. The location of peaks turns into locations of dips and vice versa.

      The methods are not described in enough detail to reproduce these results.

      The authors mention that their image reconstruction is done "using in-house MATLAB code" (line 634). They do not post a link to github, nor do they say if they share this code.

      We thank the reviewer for the comments regarding reproducibility and data sharing. In response, we have revised the Methods section and elaborated on the technical details to improve clarity and reproducibility.

      Regarding code sharing, we acknowledge that the current in-house MATLAB reconstruction code requires further refinement to improve its readability and usability. Due to limited manpower, we have not yet been able to complete this task. However, we are committed to making the code publicly available and will upload it to GitHub as soon as the necessary resources are available.

      For data sharing, we face logistical challenges due to the large size of the dataset, which spans tens of terabytes. Platforms like OpenNeuro, for example, typically support datasets up to 10TB, making it difficult to share the data in its entirety. Despite this limitation, we are more than willing to share offline reconstruction codes and raw data upon request to facilitate reproducibility.

      Regarding data robustness, we kindly refer the reviewer to our response to the previous comment, where we addressed these concerns in greater detail.

      It is not trivial to get good phase data for fMRI. The authors do not mention how they perform the respective coil-combination.

      No data are shared for reproduction of the analysis.

      Obtaining phase data is relatively straightforward when the images are retrieved directly from raw data. For coil combination, we employed the adaptive coil combination approach described by (Walsh et al.; DOI: 10.1002/(sici)1522-2594(200005)43:5<682::aid-mrm10>3.0.co;2-g ) The MATLAB code for this implementation was developed by Dr. Diego Hernando and is publicly available at https://github.com/welton0411/matlab .

      (5) The application of NODRIC is not validated.

      Previous applications of NORDIC at 3T layer-fMRI have resulted in mixed success. When not adjusted for the right SNR regime it can result in artifactual reductions of beta scores, depending on the SNR across layers. The authors could validate their application of NORDIC and confirm that the average layer-profiles are unaffected by the application of NORDIC. Also, the NORDIC version should be explicitly mentioned in the manuscript.

      Akbari, A., Gati, J.S., Zeman, P., Liem, B., Menon, R.S., 2023. Layer Dependence of Monocular and Binocular Responses in Human Ocular Dominance Columns at 7T using VASO and BOLD (preprint). Neuroscience. https://doi.org/10.1101/2023.04.06.535924

      Knudsen, L., Guo, F., Huang, J., Blicher, J.U., Lund, T.E., Zhou, Y., Zhang, P., Yang, Y., 2023. The laminar pattern of proprioceptive activation in human primary motor cortex. bioRxiv. https://doi.org/10.1101/2023.10.29.564658

      We appreciate the reviewer’s suggestion. To validate the application of NORDIC denoising in our study, we compared the BOLD activation maps before and after denoising in the visual and motor cortices, as well as the depth-dependent activation profiles in M1. These results are presented in Figure 3. The activation patterns in the denoised maps were consistent with those in the non-denoised maps but exhibited higher statistical significance. Notably, BOLD activation within M1 was only observed after NORDIC denoising, underscoring the necessity of this approach. Figure 3c shows the depth-dependent activation profiles in M1, highlighted by the green contours in Figure 3b. Both denoised and non-denoised profiles followed similar trends; however, as expected, the non-denoised profile exhibited larger confidence intervals compared to the NORDIC-denoised profile. These results confirm that NORDIC denoising enhances sensitivity without introducing distortions in the functional signal. The corresponding text has been incorporated into the Results section.

      Regarding the implementation details of NORDIC denoising, the reconstructed images were denoised using a g-factor map (function name: NIFTI_NORDIC). The g-factor map was estimated from the image time series, and the input images were complex-valued. The width of the smoothing filter for the phase was set to 10, while all other hyperparameters were retained at their default values. This information has been integrated into the Methods section for clarity and reproducibility.

      Reviewer #2 (Public Review):

      This study developed a setup for laminar fMRI at 3T that aimed to get the best from all worlds in terms of brain coverage, temporal resolution, sensitivity to detect functional responses, and spatial specificity. They used a gradient-echo EPI readout to facilitate sensitivity, brain coverage and temporal resolution. The former was additionally boosted by NORDIC denoising and the latter two were further supported by parallel-imaging acceleration both in-plane and across slices. The authors evaluated whether the implementation of velocity-nulling (VN) gradients could mitigate macrovascular bias, known to hamper the laminar specificity of gradient-echo BOLD.

      The setup allows for 0.9 mm isotropic acquisitions with large coverage at a reasonable TR (at least for block designs) and the fMRI results presented here were acquired within practical scan-times of 12-18 minutes. Also, in terms of the availability of the method, it is favorable that it benefits from lower field strength (additional time for VN-gradient implementation, afforded by longer gray matter T2*).

      The well-known double peak feature in M1 during finger tapping was used as a test-bed to evaluate the spatial specificity. They were indeed able to demonstrate two distinct peaks in group-level laminar profiles extracted from M1 during finger tapping, which was largely free from superficial bias. This is rather intriguing as, even at 7T, clear peaks are usually only seen with spatially specific non-BOLD sequences. This is in line with their simple simulations, which nicely illustrated that, in theory, intravascular macrovascular signals should be suppressible with only minimal suppression of microvasculature when small b-values of the VN gradients are employed. However, the authors do not state how ROIs were defined making the validity of this finding unclear; were they defined from independent criteria or were they selected based on the region mostly expressing the double peak, which would clearly be circular? In any case, results are based on a very small sub-region of M1 in a single slice - it would be useful to see the generalizability of superficial-bias-free BOLD responses across a larger portion of M1.

      We appreciate and understand the reviewer’s concerns. Given the small size of the hand knob region within M1 and its intersubject variability in location, defining this region automatically remains challenging. However, we applied specific criteria to minimize bias during the delineation of M1: 1) the hand knob region was required to be anatomically located in the precentral sulcus or gyrus; 2) it needed to exhibit consistent BOLD activation across the majority of testing conditions; and 3) the region was expected to show BOLD activation in the deep cortical layers under the condition of b = 0 and TE = 30 ms. Once the boundaries across cortical depth were defined, the gray matter boundaries of hand knob region were delineated based on the T1-weighted anatomical image and the cortical ribbon mask but excluded the BOLD activation map to minimize potential bias in manual delineation. Based on the new criteria, the resulting depth-dependent profiles, as shown in Figure 4, are no longer superficial-bias-free.

      As repeatedly mentioned by the authors, a laminar fMRI setup must demonstrate adequate functional sensitivity to detect (in this case) BOLD responses. The sensitivity evaluation is unfortunately quite weak. It is mainly based on the argument that significant activation was found in a challenging sub-cortical region (LGN). However, it was a single participant, the activation map was not very convincing, and the demonstration of significant activation after considerable voxel-averaging is inadequate evidence to claim sufficient BOLD sensitivity. How well sensitivity is retained in the presence of VN gradients, high acceleration factors, etc., is therefore unclear. The ability of the setup to obtain meaningful functional connectivity results is reassuring, yet, more elaborate comparison with e.g., the conventional BOLD setup (no VN gradients) is warranted, for example by comparison of tSNR, quantification and comparison of CNR, illustration of unmasked-full-slice activation maps to compare noise-levels, comparison of the across-trial variance in each subject, etc. Furthermore, as NORDIC appears to be a cornerstone to enable submillimeter resolution in this setup at 3T, it is critical to evaluate its impact on the data through comparison with non-denoised data, which is currently lacking.

      We appreciate the reviewer’s comments and acknowledge that the LGN results from a single participant were not sufficiently convincing. In this revision, we have removed the LGN-related results and focused on cortico-cortical FC. To evaluate data quality, we opted to present BOLD activation maps rather than tSNR, as high tSNR does not necessarily translate to high functional significance. In Figure 3, we illustrate the effect of NORDIC denoising, including activation maps and depth-dependent profiles. Figure 4 presents activation maps acquired under different TE and b values, demonstrating that VN gradients effectively reduce the bias toward the pial surface without altering the overall activation patterns. The results in Figure 4 and Figure 5 provide evidence that VN gradients retain sensitivity while reducing superficial bias. The ability of the setup to obtain meaningful FC results was validated through seed-based analyses, identifying distinct connectivity patterns in the superficial and deep layers of the primary motor cortex (M1), with significant inter-layer differences (see Figure 7). Further analyses with a seed in the primary sensory cortex (S1) demonstrated the reliability of the method (see Figure 8). For further details on the results, including the impact of VN gradients and NORDIC denoising, please refer to Figures 3 to 8 in the Results section.

      Additionally, we acknowledge the limitations of our current protocol for submillimeter-resolution fMRI at the individual level. We found that robust layer-dependent functional mapping often requires group-level statistics to enhance reliability. This issue has been discussed in detail in the Limitations section.

      The proposed setup might potentially be valuable to the field, which is continuously searching for techniques to achieve laminar specificity in gradient echo EPI acquisitions. Nonetheless, the above considerations need to be tackled to make a convincing case.

      Reviewer #3 (Public Review):

      Summary:

      The authors are looking for a spatially specific functional brain response to visualise non-invasively with 3T (clinical field strength) MRI. They propose a velocity-nulled weighting to remove the signal from draining veins in a submillimeter multiband acquisition.

      Strengths:

      - This manuscript addresses a real need in the cognitive neuroscience community interested in imaging responses in cortical layers in-vivo in humans.

      - An additional benefit is the proposed implementation at 3T, a widely available field strength.

      Weaknesses:

      - Although the VASO acquisition is discussed in the introduction section, the VN-sequence seems closer to diffusion-weighted functional MRI. The authors should make it more clear to the reader what the differences are, and how results are expected to differ. Generally, it is not so clear why the introduction is so focused on the VASO acquisition (which, curiously, lacks a reference to Lu et al 2013). There are many more alternatives to BOLD-weighted imaging for fMRI. CBF-weighted ASL and GRASE have been around for a while, ABC and double-SE have been proposed more recently.

      The major distinction between diffusion-weighted fMRI (DW-fMRI) and our methodology lies in the b-value employed. DW-fMRI typically measures cellular swelling using b-values greater than 1000 s/mm<sup>2</sup> (e.g., 1800 s/mm(sup>2</sup>). In contrast, our VN-fMRI approach measures hemodynamic responses by employing smaller b-values specifically designed to suppress signals from fast-flowing draining veins rather than detecting microstructural changes.

      Regarding other functional contrasts, we agree that more layer-dependent fMRI approaches should be mentioned. In this revision, we have expanded the Introduction section to include discussions of the double spin-echo approach and CBV-based methods, such as MT-weighted fMRI, VAPER, ABC, and CBF-based method ASL. Additionally, the reference to Lu et al. (2013) has been cited in the revised manuscript. The corresponding text has been incorporated into the Introduction section to provide a more comprehensive overview of alternative functional imaging techniques.

      - The comparison in Figure 2 for different b-values shows % signal changes. However, as the baseline signal changes dramatically with added diffusion weighting, this is rather uninformative. A plot of t-values against cortical depth would be much more insightful.

      - Surprisingly, the %-signal change for a b-value of 0 is not significantly different from 0 in the gray matter. This raises some doubts about the task or ROI definition. A finger-tapping task should reliably engage the primary motor cortex, even at 3T, and even in a single participant.

      - The BOLD weighted images in Figure 3 show a very clear double-peak pattern. This contradicts the results in Figure 2 and is unexpected given the existing literature on BOLD responses as a function of cortical depth.

      - Given that data from Figures 2, 3, and 4 are derived from a single participant each, order and attention affects might have dramatically affected the observed patterns. Especially for Figure 4, neither BOLD nor VN profiles are really different from 0, and without statistical values or inter-subject averaging, these cannot be used to draw conclusions from.

      We appreciate the reviewer’s suggestions. In this revision, we have made significant updates to the participant recruitment, scan protocol, data processing, and M1 delineation. Please refer to the "General Responses" at the beginning of the rebuttal and the first response to Reviewer #2 for more details.

      Previously, the variation in depth-dependent profiles was calculated across upscaled voxels within a specific layer. However, due to the small size of the hand knob region, the number of within-layer voxels was limited, resulting in inaccurate estimations of signal variation. In the revised manuscript, the signal was averaged within each layer before performing the GLM analysis, and signal variation was calculated using the temporal residuals. The technical details of these changes are described in the "Materials and Methods" section. Furthermore, while the initial submission used percentage signal change for the profiles of M1, the dramatic baseline fluctuations observed previously are no longer an issue after the modifications. For this reason, we retained the use of percentage signal change to present the depth-dependent profiles. After these adjustments, the profiles exhibited a bias toward the pial surface, particularly in the absence of VN gradients.

      - In Figure 5, a phase regression is added to the data presented in Figure 4. However, for a phase regression to work, there has to be a (macrovascular) response to start with. As none of the responses in Figure 4 are significant for the single participant dataset, phase regression should probably not have been undertaken. In this case, the functional 'responses' appear to increase with phase regression, which is contra-intuitive and deserves an explanation.

      We agreed with reviewer’s argument. In the revised results, the issues mentioned by the reviewer are largely diminished. The updated analyses demonstrate that phase regression effectively reduces superficial bias, as shown in Figures 4 and 5.

      - Consistency of responses is indeed expected to increase by a removal of the more variable vascular component. However, the microvascular component is always expected to be smaller than the combination of microvascular + macrovascular responses. Note that the use of %signal changes may obscure this effect somewhat because of the modified baseline. Another expected feature of BOLD profiles containing both micro- and microvasculature is the draining towards the cortical surface. In the profiles shown in Figure 7, this is completely absent. In the group data, no significant responses to the task are shown anywhere in the cortical ribbon.

      We agreed with reviewer’s comments. In the revised manuscript, the results have been substantially updated to addressing the concerns raised. The original Figure 7 is no longer relevant and has been removed.

      - Although I'd like to applaud the authors for their ambition with the connectivity analysis, I feel that acquisitions that are so SNR starved as to fail to show a significant response to a motor task should not be used for brain wide directed connectivity analysis.

      We appreciate the reviewer’s comments and share the concern about SNR limitations. In the updated results presented in Figure 5, the activation patterns in the visual cortex were consistent across TEs and b values. At the motor cortex, stable activation in M1 was observed at the single-subject level across most scan protocols. However, the layer-dependent activation profiles in M1 exhibited spatial instability, irrespective of the application of VN gradients. This spatial instability is not entirely unexpected, as T2*-based contrast is inherently sensitive to factors that perturb the magnetic field, such as eye movements, respiration, and macrovascular signal fluctuations. Additionally, ICA-based artifact removal was intentionally omitted in Figure 4 to ensure fair comparisons across protocols, leaving some residual artifacts unaddressed. Variability in task performance during button-pressing sessions may have further contributed to the observed inconsistencies.

      Although these findings suggest that submillimeter-resolution fMRI may not yet be reliable for individual-level layer-dependent functional mapping, the group-level FC analyses can still yield robust results. In Figure 7, group-level statistics revealed distinct functional connectivity (FC) patterns associated with superficial and deep layers in M1. These FC maps exhibited significant differences between layers, demonstrating that VN fMRI enhances inter-layer independence. Additional FC analyses with a seed placed in S1 further validated these findings (see Figure 8).

      The claim of specificity is supported by the observation of the double-peak pattern in the motor cortex, previously shown in multiple non-BOLD studies. However, this same pattern is shown in some of the BOLD weighted data, which seems to suggest that the double-peak pattern is not solely due to the added velocity nulling gradients. In addition, the well-known draining towards the cortical surface is not replicated for the BOLD-weighted data in Figures 3, 4, or 7. This puts some doubt about the data actually having the SNR to draw conclusions about the observed patterns.

      We appreciate the reviewer’s comments. In the updated results, the efficacy of the VN gradients is evident near the pial surface, as shown in Figures 4 and 5. In Figure 4, comparing the second and third columns (b = 0 and b = 6 s/mm<sup>2</sup>, respectively, at TE = 38 ms), the percentage signal change in the superficial layers is generally lower with b = 6 s/mm<sup>2</sup> than with b = 0. This indicates that VN gradient-induced signal suppression is more pronounced in the superficial layers. Additionally, in Figure 5, the VN gradients effectively suppressed macrovascular signals as highlighted by the blue circles. These observations support the role of VN gradients in enhancing specificity by reducing superficial bias and macrovascular contamination. Furthermore, bias towards cortical surface was observed in the updated results in Figure 4.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) L141: "depth dependent" is slightly misleading here. It could be misunderstood to suggest that the authors are assessing how spatial specificity varies as a function of depth. Rather, they are assessing spatial specificity based on depth-dependent responses (double peak feature). Perhaps "layer-dependent spatial specificity" could be substituted with laminar specificity?

      We thank the reviewer for the suggestion. The term “depth dependent” has been replaced by “layer dependent” in the revised manuscript.

      (2) L146-149: these do not validate spatial specificity.

      The original text is removed.

      (3) L180: Maybe helpful to describe what the b-value is to assist unfamiliar readers.

      We have clarified the b-value as “the strength of the bipolar diffusion gradients” where it is first mentioned in the manuscript.

      (4) Figure 1B: I think it would be appropriate with a sentence of how the authors define micro/macrovasculature. Figure 1B seems to suggest that large ascending veins are considered microvascular which I believe is a bit unconventional. Nevertheless, as long as it is clearly stated, it should be fine.

      In our context, macrovasculature refers to vessels that are distal to neural activation sites and contribute to extravascular contamination. These vessels are typically larger in size (e.g., > 0.1 mm in diameter) and exhibit faster flow rates (e.g., > 10 mm/s).

      (5) I think the authors could be more upfront with the point about non-suppressed extravascular effects from macrovasculature, which was briefly mentioned in the discussion. It could already be highlighted in the introduction or theory section.

      We thank the reviewer’s suggestions. We have expanded the discussion of extravascular effects from macrovasculature in both the Introduction (5th paragraph) and Discussion (3rd paragraph) sections.

      (6) The phase regression figure feels a bit misplaced to me. If the authors agree: rather than showing the TE-dependency of the effect of phase regression, it may be more relevant for the present study to compare the conventional setup with phase regression, with the VN setup without phase regression. I.e., to show how the proposed setup compares to existing 3T laminar fMRI studies.

      In this revision, both the TE-dependent and VN-dependent effects of phase regression were investigated. The results in Figure 4 and Figure 5 demonstrated that phase regression effectively suppresses macrovascular contributions primarily near the gray matter/CSF boundary, irrespective of TE or the presence of VN gradients.

      (7) L520: It might be beneficial to also cite the large body of other laminar studies showing the double peak feature to underscore that it is highly robust, which increases its relevance as a test-bed to assess spatial specificity.

      We agreed. More literatures have been cited (Chai et al., 2020; Huber et al., 2017a; Knudsen et al., 2023; Priovoulos et al., 2023).

      (8) L557: The argument that only one participant was assessed to reduce inter-subject variability is hard to buy. If significant variability exists across subjects, this would be highly relevant to the authors and something they would want to capture.

      We thank the reviewer for the suggestions. In this revision, we have increased the number of participants to 4 for protocol development and 14 for resting-state functional connectivity analysis, allowing us to better assess and account for inter-subject variability.

      (9) L637: add download link and version number.

      The download link has been added as requested. The version number is not applicable.

      (10) L638: How was the phase data coil-combined?

      The reconstructed multi-channel data, which were of complex values, were combined using the adaptive combination method (Walsh et al.; DOI: 10.1002/(sici)1522-2594(200005)43:5<682::aid-mrm10>3.0.co;2-g). The MATLAB code for this implementation was developed by Dr. Diego Hernando and is publicly available at https://github.com/welton0411/matlab . The phase data were then extracted using the MATLAB function ‘angle’.

      (11) L639: Why was the smoothing filter parameter changed (other parameters were default)?

      The smoothing filter parameter was set based on the suggestion provided in the help comments of the NIFTI_NORDIC function:

      function  NIFTI_NORDIC(fn_magn_in,fn_phase_in,fn_out,ARG)

      % fMRI

      %

      %  ARG.phase_filter_width=10;

      In other words, we simply followed the recommendation outlined in the NIFTI_NORDIC function’s documentation.

      (12) I assume the phase data was motion corrected after transforming to real and imaginary components and using parameters estimated from magnitude data? Maybe add a few sentences about this.

      Prior to phase regression, the time series of real and imaginary components were subjected to motion correction, followed by phase unwrapping. The phase regression was incorporated early in the data processing pipeline to minimize the discrepancy in data processing between magnitude and phase images (Stanley et al., 2021).

      (13) Was phase regression applied with e.g., a deming model, which accounts for noise on both the x and y variable? In my experience, this makes a huge difference compared with regular OLS.

      We appreciate the reviewer’s insightful comment. We are aware that the noise present in both magnitude and phase data therefore linear Deming regression would be a good fit to phase regression (Stanley et al., 2021). To perform Deming regression, however, the ratio of magnitude error variance to phase error variance must be predefined. In our initial tests, we found that the regression results were sensitive to this ratio. To avoid potential confounding, we opted to use OLS regression for the current analysis. However, we agreed Deming model could enhance the efficacy of phase regression if the ratio could be determined objectively and properly.

      (14) Figure 2: What is error bar reflecting? I don't think the across-voxel error, as also used in Figure 4, is super meaningful as it assumes the same response of all voxels within a layer (might be alright for such a small ROI). Would it be better to e.g. estimate single-trial response magnitude (percent signal change) and assess variability across? Also, it is not obvious to me why b=30 was chosen. The authors argue that larger values may kill signal, but based on this Figure in isolation, b=48 did not have smaller response magnitudes (larger if anything).

      We agreed with the reviewer’s opinion on the across-voxel error. In the revised manuscript, the signal was averaged within each layer before performing the GLM analysis, and signal variation was calculated using the temporal residuals. The technical details of these changes are described in the "Materials and Methods" section.

      Additionally, the bipolar diffusion gradients were modified from a single direction to three orthogonal directions. As a result, the questions and results related to b=30 or b=48 are no longer applicable.

      (15) Figure 5: would be informative to quantify the effect of phase regression over a large ROI and evaluate reduction in macrovascular influence from superficial bias in laminar profiles.

      We appreciate the reviewer’s suggestion. In the revised manuscript, the reduction in macrovascular influence from superficial bias across a large ROI is displayed in Figure 5. Additionally, the impact on laminar profiles is demonstrated in Figure 4.

      (16) L406-408: What kind of robustness?

      We acknowledge that describing the protocol as “robust” was an overstatement. The updated results indicate that the current protocol for submillimeter fMRI may not yet be suitable for reliable individual-level layer-dependent functional mapping. However, group-level functional connectivity (FC) analyses demonstrated clear layer-specific distinctions with VN fMRI, which were not evident in conventional fMRI. These findings highlight the enhanced layer specificity achievable with VN fMRI.

      (17) Figure 8: I think C) needs pointers to superficial, middle, and deep layers? Why is it not in the same format as in Figure 9C? The discussion of the FC results could benefit from more references supporting that these observations are in line with the literature.

      In the revised results, the layer pooling shown in Figure 9c has been removed, making the question regarding format alignment no longer applicable. Additionally, references supporting the FC results have been added to the revised Discussion section (7th paragraph).

      (18) L456-457: But correlation coefficients may also be biased by different CNR across layers.

      That is correct. In the updated FC results in Figure 7 to 9, we used group-level statistics rather than correlation coefficients.

      Reviewer #3 (Recommendations For The Authors):

      The results in Figure 2-6 should be repeated over, or averaged over, a (small) group of participants. N=6 is usual in this field. I would seriously reconsider the multiband acceleration - the acquisition seemingly cannot support the SNR hit.

      A few more specific points are given below:

      (1) Abstract: The sentence about LGN in the abstract came for me out of the blue - why would LGN be important here, it's not even a motor network node? Perhaps the aims of the study should be made more clear - if it's about networks as suggested earlier then a network analysis result would be expected too. Expanding the directed FC findings would improve the logical flow of the abstract. Given the many concerns, removing the connectivity analysis altogether would also be an option.

      We thank the reviewer for the suggestions. The LGN-related results indeed diluted the focus of this study and have been completely removed in this revision.

      (2) Line 105: in addition to the VASO method, ..

      The corresponding text has been revised, and as a result, the reviewer’s suggestion is no longer applicable.

      (3) If out of the set MB 4 / 5 / 6 MB4 was best, why did the authors not continue with a comparison including MB3 and MB2? It seems to me unlikely that the MB4 acquisition is actually optimal.

      Results: We appreciate the reviewer’s suggestions. In this revision, we decreased the MB factor to 3, as it allowed us to increase the in-plane acceleration rate to 3, thereby shortening the TE. The resulting sensitivity for both individual and group-level results is detailed in earlier responses, such as the response to Q16 for Reviewer #2.

      (4) The formatting of the references is occasionally flawed, including first names and/or initials. Please consider using a reliable reference manager.

      We used Zotero as our reference manager in this revision to ensure consistency and accuracy. The references have been formatted according to the APA style.

      (5) In the caption of Figure 5, corrected and uncorrected p values are identical. What multiple comparisons correction was made here? A multiple comparisions over voxels (as is standard) would usually lead to a cut-off ~z=3.2. That would remove most of the 'responses' shown in figure 5.

      We appreciate the reviewer’s comment. The original results presented in Figure 5 have been removed in the revised manuscript, making this comment no longer applicable.

    1. eLife Assessment

      In this useful study, Millard et al. assessed the effects of nicotine on pain sensitivity and peak alpha frequency (PAF). The evidence shown is incomplete to support the key claim that nicotine modulates PAF or pain sensitivity, considering the effect sizes observed. This raises the question of whether the chosen experimental intervention was the most suitable approach for investigating their research question. Nonetheless, the work can be incorporated into the literature investigating the relationship between nicotine and pain, and could be of broad interest to pain researchers.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Millard and colleagues investigated if the analgesic effect of nicotine on pain sensitivity, assessed with two pain models, is mediated by Peak Alpha Frequency (PAF) recorded with resting state EEG. The authors found indeed that nicotine (4 mg, gum) reduced pain ratings during phasic heat pain but not cuff pressor algometry compared to placebo conditions. Nicotine also increased PAF (globally). However, mediation analysis revealed that the reduction in pain ratings elicited by the phasic heat pain after taking nicotine was not mediated by the changes in PAF. Also, the authors only partially replicated the correlation between PAF and pain sensitivity at baseline (before nicotine treatment). At the group-level no correlation was found, but an exploratory analysis showed that the negative correlation (lower PAF, higher pain sensitivity) was present in males but not in females. The authors discuss the lack of correlation.<br /> In general, the study is rigorous, methodology is sound and the paper is well written. Results are compelling and sufficiently discussed.

      Strengths:

      Strengths of this study are the pre-registration, proper sample size calculation and data analysis. But also the presence of the analgesic effect of nicotine and the change in PAF.

      Weaknesses:

      It would even be more convincing if they had manipulated PAF directly.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Millard et al. investigates the effect of nicotine on alpha peak frequency and pain in a very elaborate experimental design. According to the statistical analysis, the authors found a factor-corrected significant effect for prolonged heat pain but not for alpha peak frequency in response to the nicotine treatment.

      Strengths:

      I very much like the study design and that the authors followed their research line by aiming to provide a complete picture of the pain-related cortical impact of alpha peak frequency. This is very important work, even in the absence of any statistical significance. I also appreciate the preregistration of the study and the well-written and balanced introduction.

      Weaknesses:

      The weakness of the study revolves around two aspects:

      (1) Source separation (ICA or similar) would have been more appropriate than electrode ROIs to extract the alpha signal. By using a source separation approach, different sources of alpha (mu, occipital alpha, laterality) could be disentangled.

      (2) There is also a suggestion in the literature in the manuscript) that nicotine treatment may not work as intended. Instead, the authors' decision to use nicotine to modulate peak alpha frequency and pain was based on other, inappropriate work on chronic pain and chronic smokers. In the present study, the authors use nicotine treatment and transient painful stimulation in nonsmokers. The unfortunate decision to use nicotine severely hampered the authors' goal of the study.

      Impact: The impact of the study could be to show what did not work to answer the authors' research questions. The study would have more impact with a more appropriate pain intervention model and an analysis strategy that untangles the different alpha sources.

    4. Reviewer #3 (Public review):

      In this manuscript, Millard et al. investigate the effects of nicotine on pain sensitivity and peak alpha frequency (PAF) in resting state EEG. To this end, they ran a randomized, double-blind, placebo-controlled experiment involving 62 healthy adults that received either 4 mg nicotine gum (n=29) or placebo (n=33). Prolonged heat and pressure were used as pain models. Resting state EEG and pain intensity (assessed with a visual analog scale) were measured before and after the intervention. Additionally, several covariates (sex at birth, depression and anxiety symptoms, stress, sleep quality, among others) were recorded. Data was analyzed using ANCOVA-equivalent two-wave latent change score models, as well as repeated measures analysis of variance. Results do not show experimentally relevant changes of PAF or pain intensity scores for neither of the prolonged pain models due to nicotine intake.

      The main strengths of the manuscript are its solid conceptual framework and the thorough experimental design. The researchers make a good case in the introduction and discussion for the need to further investigate the association of PAF and pain sensitivity. Furthermore, they proceed to carefully describe every aspect of the experiment in great detail, which is excellent for reproducibility purposes. Finally, they analyze the data from different and provide an extensive report of their results.

      There are relevant weaknesses to highlight. Firstly, authors preregistered the study and the analysis plan, but the preregistration does not contain an estimation of the expected effect sizes or the rationale for the selected the sample size. Furthermore, the authors interpret their results in a way that is not supported by the evidence (which is notorious in the abstract and the first paragraph of the discussion). Even though some of the differences are statistically significant (e.g., global PAF, pain intensity ratings during heat pain), these differences are far from being experimentally or clinically relevant. The effect sizes observed are not sufficiently large to consider that pain sensitivity was modulated by the nicotine intake, which puts into question all the answers to the research questions posed in the study. The authors attempt to nuance this throughout the discussion, but in a way that is not compatible with the main claims.

    1. eLife Assessment

      This study presents important findings on the role of CXXC-finger protein 1 in regulatory T cell gene regulation and function. The evidence supporting the authors' claims is convincing, with mostly state-of-the-art technology. The work will be of relevance to immunologists interested in regulatory T cell biology and autoimmunity.

    2. Reviewer #1 (Public review):

      Summary:

      This work investigated the role of CXXC-finger protein 1 (CXXC1) in regulatory T cells. CXXC1-bound genomic regions largely overlap with Foxp3-bound regions and regions with H3K4me3 histone modifications in Treg cells. CXXC1 and Foxp3 interact with each other, as shown by co-immunoprecipitation. Mice with Treg-specific CXXC1 knockout (KO) succumb to lymphoproliferative diseases between 3 to 4 weeks of age, similar to Foxp3 KO mice. Although the immune suppression function of CXXC1 KO Treg is comparable to WT Treg in an in vitro assay, these KO Tregs failed to suppress autoimmune diseases such as EAE and colitis in Treg transfer models in vivo. This is partly due to the diminished survival of the KO Tregs after transfer. CXXC1 KO Tregs do not have an altered DNA methylation pattern; instead, they display weakened H3K4me3 modifications within the broad H3K4me3 domains, which contain a set of Treg signature genes. These results suggest that CXXC1 and Foxp3 collaborate to regulate Treg homeostasis and function by promoting Treg signature gene expression through maintaining H3K4me3 modification.

      Strengths:

      Epigenetic regulation of Treg cells has been a constantly evolving area of research. The current study revealed CXXC1 as a previously unidentified epigenetic regulator of Tregs. The strong phenotype of the knockout mouse supports the critical role CXXC1 plays in Treg cells. Mechanistically, the link between CXXC1 and the maintenance of broad H3K4me3 domains is also a novel finding.

      Weaknesses:

      The authors addressed the reviewer's critiques fully in the revised manuscript.

    3. Reviewer #2 (Public review):

      FOXP3 has been known to form diverse complexes with different transcription factors and enzymes responsible for epigenetic modifications, but how extracellular signals timely regulate FOXP3 complex dynamics remains to be fully understood. Histone H3K4 tri-methylation (H3K4me3) and CXXC finger protein 1 (CXXC1), which is required to regulate H3K4me3, also remain to be fully investigated in Treg cells. Here, Meng et al. performed a comprehensive analysis of H3K4me3 CUT&Tag assay on Treg cells and a comparison of the dataset with the FOXP3 ChIP-seq dataset revealed that FOXP3 could facilitate the regulation of target genes by promoting H3K4me3 deposition. Moreover, CXXC1-FOXP3 interaction is required for this regulation. They found that specific knockdown of Cxxc1 in Treg leads to spontaneous severe multi-organ inflammation in mice and that Cxxc1-deficient Treg exhibits enhanced activation and impaired suppression activity. In addition, they have also found that CXXC1 shares several binding sites with FOXP3 especially on Treg signature gene loci, which are necessary for maintaining homeostasis and identity of Treg cells.

      Comments on revisions:

      The authors have fully addressed the reviewers' comments and questions.

    4. Reviewer #3 (Public review):

      In the report entitled "CXXC-finger protein 1 associates with FOXP3 to stabilize homeostasis and suppressive functions of regulatory T cells", the authors demonstrated that Cxxc1-deletion in Treg cells leads to the development of severe inflammatory disease with impaired suppressive function. Mechanistically, CXXC1 interacts with Foxp3 and regulates the expression of key Treg signature genes by modulating H3K4me3 deposition. Their findings are interesting and significant.

      Comments on revisions:

      In the revised manuscript, the authors have responded well to all the concerns reviewers raised. The manuscript has further improved.

    1. eLife Assessment

      The authors use single molecule imaging and in vivo loop-capture genomic approaches to investigate estrogen mediated enhancer-target gene activation in human cancer cells. These potentially important results suggest that ER-alpha can, in a temporal delay, activate a non-target gene TFF3, which is in proximity to the main target gene TFF1, even though the estrogen responsive enhancer does not loop with the TFF3 promoter. To explain these results, the authors invoke a transcriptional condensate model. The claim of a temporal delay and effects of the target gene transcription on the non-target gene expression are supported by solid evidence but there is no direct evidence of the role of a condensate in mediating this effect. The reviewers appreciate that the authors have done a lot of work to strengthen the study. This work will be of interest to those studying transcriptional gene regulation and hormone-aggravated cancers.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But the authors have greatly improved the manuscript during the revision work.

      Comments on latest version:

      The authors have done a lot of work for the revision. The manuscript has been greatly improved.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript Bohra et al. measure the effects of estrogen responsive gene expression upon induction on nearby target genes using a TAD containing the genes TFF1 and TFF3 as a model. The authors propose that there is a sort competition for transcriptional machinery between TFF1 (estrogen responsive) and TFF3 (not responsive) such that when TFF1 is activated and machinery is recruited, TFF3 is activated after a time delay. The authors attribute this time delay to transcriptional machinery that was being sequestered at TFF1 becomes available to the proximal TFF3 locus. The authors demonstrate that this activation is not dependent on contact with the TFF1 enhancer through deletion, instead they conclude that it is dependent on a phase-separated condensate which can sequester transcriptional machinery. Although the manuscript reports an interesting observation that there is a dose dependence and time delay on the expression of TFF1 relative to TFF3, there is much room for improvement in the analysis and reporting of the data. Most importantly there is no direct test of condensate formation at the locus in the context of this study: i.e. dissolution upon the enhancer deletion, decay in a temporal manner, and dependence of TFF1 expression on condensate formation. Using 1,6' hexanediol to draw conclusion on this matter is not adequate to draw conclusions on the effect of condensates on a specific genes activity given current knowledge on its non-specificity and multitude of indirect effects. Thus, in my opinion the major claim that this effect of a time delayed expression of TFF3 being dependent on condensates in not supported by the current data.

      Strengths:

      The depends of TFF1 expression on a single enhancer and the temporal delay in TFF3 is a very interesting finding.

      The non-linear dependence of TFF1 and TTF3 expression on ER concentration is very interesting with potentially broader implications.

      The combined use of smFISH, enhancer deletion, and 4C to build a coherent model is a good approach.

      Weaknesses:

      There is no direct observation of a condensate at the TFF1 and TFF3 locus and how this condensate changes over time after E2 treatment, upon enhancer deletion, whether transcriptional machinery is indeed concentrated within it, and other claims on condensate function and formation made in the manuscript. The use of 1,6' HD is not appropriate to test this idea given how broadly it acts.

      Comments on latest version:

      I don't think the response to Reviewer 2's comment on LLPS condensates on TFF1 are adequate and given this point is essential to the claims of the manuscript they must be addressed. Namely, the data from Saravavanan, 2020 actually suggest that condensate formation at the locus is not very predictive and barely enriched over random spots. The claims in the manuscript on the dependence of the condensate being responsible for sequestering transcriptional machinery are quite strong and the crux of the current model. To continue to make this claim (which I don't think is necessary since there are other possible models) the authors must test if the condensate at his locus (1) shows time dependent behavior, (2) is not present or weakened at the locus in cells that show high TFF3 expression, (3) is indeed enriched for transcriptional machinery when TFF1 peaks. The use of 1,6 hexanediol is not appropriate as pointed out by reviewer 2 and is no longer considered as an appropriate experiment by many as the whole notion of LLPS forming nuclear condensates is now under question. Such condensates can form through a variety of mechanisms as reviewed for example by Mittaj and Pappu (A conceptual framework for understanding phase separation and addressing open questions and challenges, Molecular Cell, 2022). Furthermore, given the distance between TFF1 and TFF3 it is hard to imagine that if a condensate that concentrates machinery in a non-stoichiometric manner was forming how it would not boost expression on both genes and be just specific to one. There must be another mechanism in my opinion.

      I would recommend the authors remove this aspect of their manuscript/model and simply report their interesting findings that are actually supported by data: The temporal delay of TFF3 expression, the dependence on ER concentration, and the enhancer dependence.

    1. eLife Assessment

      This important study developed a mathematical model to predict biological age by leveraging physiological traits across multiple organ systems. The results presented are convincing, utilizing comprehensive data-driven approaches. However, additional external validation could further strengthen its generalizability. The model provides a way to identify environmental and genetic factors impacting aging and lifespan, revealing new factors potentially affecting aging. It also shows promise for evaluating therapeutics aimed at prolonging a healthy lifespan.

    2. Reviewer #1 (Public review):

      In this study, the authors developed a mathematical model to predict human biological ages using physiological traits. This model provides a way to identify environmental and genetic factors that impact aging and lifespan.

      Strength:

      (1) The topic addressed by the authors - human age predication using physiological traits - is an extremely interesting, important, and challenging question in the aging field. One of the biggest challenges is the lack of well-controlled data from a large number of humans. However, the authors took this challenge and tried their best to extract useful information from available data.<br /> (2) Some of the findings can provide valuable guidelines for future experimental design for human and animal studies. For example, it was found that this mathematical model can best predict age when all different organ and physiological systems are sampled. This finding makes scenes in general, but can be, and have been, neglected when people use molecular markers to predict age. Most of those studies have used only one molecular trait or different traits from one tissue.

      Weakness:

      (1) As I mentioned above, the Biobank data used here are not designed for this current study, so there are many limitations for model development using these data, e.g., missing data points and irrelevant measurements for aging. This is a common caveat for human studies and has been discussed by the authors.<br /> (2) There is no validation dataset to verify the proposed model. The authors suggested that human biological age can be predicted with a high accuracy using 12 simple physiological measurements. It will be super useful and convincing if another biobank dataset containing those 12 traits can be applied to the current model.

      Comments on revisions:

      In this revision, the authors improved the manuscript by adding discussion of two main weaknesses about human data limitation and model validation. My several other specific concerns and suggestions are all properly resolved.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors developed a mathematical model to predict human biological ages using physiological traits. This model provides a way to identify environmental and genetic factors that impact aging and lifespan.

      Strengths:

      (1) The topic addressed by the authors - human age predication using physiological traits - is an extremely interesting, important, and challenging question in the aging field. One of the biggest challenges is the lack of well-controlled data from a large number of humans. However, the authors took this challenge and tried their best to extract useful information from available data.

      Authors thank an anonymous reviewer for agreeing that physiological clock building and analysis is an interesting and important even though challenging task.

      (2) Some of the findings can provide valuable guidelines for future experimental design for human and animal studies. For example, it was found that this mathematical model can best predict age when all different organ and physiological systems are sampled. This finding makes sense in general but can be, and has been, neglected when people use molecular markers to predict age. Most of those studies have used only one molecular trait or different traits from one tissue.

      Authors thank an anonymous reviewer for highlighting the importance of the approach we employ to sample traits for biological age prediction from multiple organs and systems, which ultimately provides more wholistic information

      Weaknesses:

      (1) As I mentioned above, the Biobank data used here are not designed for this current study, so there are many limitations for model development using these data, e.g., missing data points and irrelevant measurements for aging. This is a common caveat for human studies and has been discussed by the authors.

      Thank you for pointing out the caveats. Indeed, most databases and datasets including the UKBB that we use here have missing or inaccurate entries. We do discuss it in the text, as well as suggest and employ strategies to mitigate these caveats. We now updated the text to highlight these issues even further. Specifically, in the second paragraph of the “Results” section, we added the following text: “Most large human databases and datasets, including UKBB, have certain limitations, such as incomplete or missing data points. Therefore, before proceeding to modelling aging, we needed to address the following three issues:”

      (2) There is no validation dataset to verify the proposed model. The authors suggested that human biological age can be predicted with high accuracy using 12 simple physiological measurements. It will be super useful and convincing if another biobank dataset containing those 12 traits can be applied to the current model.

      Thank you for this comment. Indeed, having a replication cohort would be quite valuable. As of today, there is no comparable dataset to verify performance of the clock model or to attempt to validate GWAS results. The closest possible is the NIH-led research program “All Of Us”, which aims to collect data on 1 million people, which unfortunately is not available to for-profit companies. It is theoretically possible to rebuild a clock only using a small number of phenotypes present in both datasets with the goal of training it on one dataset and test-applying it to another, but this won’t ultimately address the accuracy of the wholistic physiological clock presented here. We hope academic labs will utilize our clock-modeling approach and apply it to datasets currently unavailable to us and publish their findings.

      To strengthen the credentials of our biological clock, we would like to remind the reviewer that we performed 10 rounds of validation, where, in each round, 10% of the data were left out from the model training such that the clock was created using remaining 90%. The model was subsequently tested on the 10% that was left out. Over 10 rounds, different 10% of data were left out and statistics for this 10-fold cross-validation age available in the supplementary materials. We have now updated the text to make this validation more apparent.

      Specifically, we added to the "Results” section, “A mathematical model to predict age” subsection, third paragraph, the following text: “Specifically, we performed 10 rounds of cross-validation, where 10% of data were held out and the remaining 90% used for training. Over 10 rounds, different 10% were held out for validation. In each case, the findings were validated in the test set. Full statistics and approach are described in supplementary computational methods.”

      Additionally, the details of this cross-validation are described in detail in supplementary methods.

      Additionally, we compared published GWAS results obtained for human aging clocks using modalities that were different yet relevant to human health. Specifically, we looked at GWAS for “Epigenetic Blood Age Acceleration” (Lu et al., 2018), ML-imaging-based human retinal aging clock (Ahadi et al., 2023), PhenoAgeAcceleration and BioAgeAcceleration (Kuo et al., 2021), and the ∆Age GWAS that we presented in our manuscript. We now describe the results of this comparison in our manuscript. Briefly, there is no overlap between GWAS results for any two of these published clocks built via different modalities – retina, DNA methylation, or physiological functions (between each other or with our model). However, there is a significant genetic overlap (p<10E-8) between clocks built using human phenotypic measures in a cohort of National Health and Nutrition Examination Survey (NHANES) III in the United States (7 variables) and ∆Age from Physiological clock from UKBB that we describe here (121 variables), further validating our approach. It is interesting to consider the reasons why genetic associations for human aging built using different modalities do not appear to have common genetic corelates, something we also now discuss in our manuscript.

      Specifically, we added to the "Results” section, “Genetic loci associated with biological age” subsection, third paragraph, the following text: “Additionally, we compared our ∆Age GWAS association results with similar GWAS studies that were performed for other biological clocks. For example, (McCartney et al., 2021) used DNA methylation data on 40,000 individuals to compute biological age called GrimAge. After that they calculated an intrinsic epigenetic age acceleration (IEAA, a value similar to ∆Age, which measured a deviation of biological age from chronological age) and performed GWAS.” Additionally, we added to the “Discussion” section, “Broader implications of the model for physiological aging” subsection, fourth paragraph, the following text: “To further analyze the meaning of genetic associations with ∆Age that we described above, we compared several published GWAS results obtained for human aging clocks using different health modalities. Specifically, we looked at GWAS for “Epigenetic Blood Age Acceleration” (Lu et al., 2018), ML-imaging-based human retinal aging clock (Ahadi et al., 2023), PhenoAgeAcceleration and BioAgeAcceleration (Kuo et al., 2021), and the ∆Age GWAS we presented in our manuscript. Surprisingly, we discovered that there is no overlap between GWAS results for any two of these clocks built via different modalities – retina, DNA methylation, or physiological functions. However, there is a significant genetic overlap between clocks built using human phenotypic measures and our ∆Age model we describe. For example, the Biological Age Clock Acceleration calculated using HbA1c, Albumin, Cholesterol, FEV, Urea nitrogen, SBP, and Creatinine (Levine, 2013) in a US cohort [from National Health and Nutrition Examination Survey (NHANES)] yielded 16 significant hits in the GWAS analysis, five of which were also significant in our GWAS for UKBB based ∆Age. These five common loci were close to the following genes - APOB, PIK3CG, TRIB1, SMARCA4, and APOE. The significance of this overlap is p < 10<sup>-8</sup>, suggesting that the ∆Age model we propose might be translatable to other cohorts of people.

      An interesting question to consider is why GWAS results from other clock modalities, such as DNA methylation and retinal imaging do not yield any genetic similarities to each other or to physiological and biological clocks. It is possible that these modalities of age assessment depend on completely genetically independent biological processes. For example, in a simplified manner - blood composition might be heavily weighted for DNA methylation, vascular structure for retinal scans, and muscle/bone/kidney health for physiological clocks. Data from model organisms suggest the master regulators of aging exist, and APOE is the best genetic variant known to influence human aging. Interestingly, only the biological and physiological clock models that we propose here pick it up as a hit. Alternatively, it is also possible that the true master regulators of aging rate are under stringent purifying selection; for example, due to an important role in development, and therefore, do not have genetic variability in human populations examined. As such, they could not be identified as hits in any GWAS studies.”

      Reviewer #2 (Public Review):

      In this manuscript, Libert et al. develop a model to predict an individual's age using physiological traits from multiple organ systems. The difference between the predicted biological age and the chronological age -- ∆Age, has an effect equivalent to that of a chronological year on Gompertz mortality risk. By conducting GWAS on ∆Age, the authors identify genetic factors that affect aging and distinguish those associated with age-related diseases. The study also uncovers environmental factors and employs dropout analysis to identify potential biomarkers and drivers for ∆Age. This research not only reveals new factors potentially affecting aging but also shows promise for evaluating therapeutics aimed at prolonging a healthy lifespan. This work represents a significant advancement in data-driven understanding of aging and provides new insights into human aging. Addressing the points raised would enhance its scientific validity and broaden its implications.

      Thank you!

      Major points:

      (1) Enhance the description and clarity of model evaluation.

      The manuscript requires additional details regarding the model's evaluation. The authors have stated "To develop a model that predicts age, we experimented with several algorithms, including simple linear regression, Gradient Boosting Machine (GBM) and Partial Least Squares regression (PLS). The outcomes of these approaches were almost identical". It is currently unclear whether the 'almost identical outcomes' mentioned refer to the similarity in top contribution phenotypes, the accuracy of age prediction, or both. To resolve this ambiguity, it would be beneficial to include specific results and comparisons from each of these models.

      Thank you for this comment. We now describe details of the model selection and provide data on outcome caparisons. Briefly, different approaches have different advantages and limitations; however, we chose one approach, and did not develop and analyze several independent models in parallel in order to not artificially inflate our False Discovery Rate (FDR). However, we now provide rationale and comparative performance of these three approaches. Specifically, we added to the "Results” section, “A mathematical model to predict age” subsection, first paragraph the following text: “Different approaches have different advantages and limitations; however, we decided to choose one approach, and not develop and analyze several independent models in parallel in order to not artificially inflate the False Discovery Rate (FDR). We ultimately selected PLS regression because it enabled us to determine the number and composition of components required to predict age optimally from the data, which provides additional insights into the biology of human aging. But before making this selection, we compared the performance of the three approaches. The outcomes of PLS and linear regression were almost identical (R-squared between ∆Age values derived by these two methods was 0.99, meaning that if one model were to predict an individual was 62 years old, the other model would have the same prediction). This similarity is likely due to the small number of predictors (121 phenotypes) and comparatively large number of participants (over 400,000). The correlation between GBM model outcomes and PLS (and linear regression) was slightly smaller (R-squared = 0.87). The reason for the lower correlation is likely the need for imputation in PLS and linear regression models. The GBM model tolerates missing data, whereas linear regression and PLS methods require imputation or removal of individuals with too many datapoints missing, an approach we describe in more detail below.”

      Additionally, after we obtained associations of ∆Age values with genetical loci, which formed the candidate base for gene targets to influence human aging (figure 5b), we verified the top association obtained via the PLS model in Linear and GBM models. All the top candidates that we verified had statistically significant associations in all the models of ∆Age (CST3, APOE, HLA locus, CPS1, PIK3CG, IGF1). The precise strengths of the associations were different, but that is to be expected given that linear datasets had some data imputed while GBM model was built with missing values. We believe that due to small number of predictors (121) compared to a vastly larger number of individuals (over 400,000), the differences the three models introduced to final outcomes were quite small.

      To convey this message, we added to the "Discussion” section, “Broader implications of the model for physiological aging” subsection, 7th paragraph, the following text: “It is interesting to note that the three approaches we used to generate age prediction model (PLS, GBM, and linear regression) yielded very similar or identical results in performance. We chose to settle on one approach (PLS) to not artificially inflate the False Discovery Rate (FDR); however, we verified that the top genetic loci associations obtained via the PLS model were also obtained in the GBM and linear models. Specifically, the top candidates (CST3, APOE, HLA locus, CPS1, PIK3CG, IGF1) identified in the PLS approach had statistically significant associations in all the models of ∆Age. It is likely that due to the small number of predictors (121) compared to a vastly larger number of individuals (over 400,000), the differences that these models introduce to final outcomes are quite small, which increases our confidence in the results.”

      Furthermore, the authors mention "to test for overfitting, a PLS model had been generated on randomly selected 90% of individuals and tested on the remaining 10% with similar results". To comprehensively assess the model's performance, it is crucial to provide detailed results for both the test and validation datasets. This should at least include metrics such as correlation coefficients and mean squared error for both training and test datasets.

      Thank you for bringing up this point. The detailed description, details and statistics of cross-validation procedure is described in supplementary computational methods. Briefly, across 10 rounds of validation the Root Mean Square Error of Prediction (RMSEP) did not exceed 4.81 for females when all 9 PLS components were considered, and RMSEP form males was 5.1 when all 11 components were considered. The variation of RMSEP between different datasets was less than 0.1. We have now updated the text to make this validation more apparent. Specifically, we added to the "Results” section, “A mathematical model to predict age” subsection, third paragraph the following text: “Specifically, we performed 10 rounds of cross-validation, where 10% of data were held out and the remaining 90% used for training. Over 10 rounds, different 10% were held out for validation. In each case, the findings were validated in the test set. Full statistics and approach are described in supplementary computational methods.”

      (2) External validation and generalization of results

      To enhance the robustness and generalizability of the study's findings, it is crucial to perform external validation using an independent population. Specifically, conducting validation with the participants of the 'All of Us' research program offers a unique opportunity. This diverse and extensive cohort, distinct from the initial study group, will serve as an independent validation set, providing insights into the applicability of the study's conclusions across varied demographics.

      Thank you for this comment. As we mentioned above, we agree that having a replication cohort would be very valuable for this study, as well as many other studies that stem from UKBB dataset. However, yet, there is no comparable dataset to verify performance of the clock or to attempt to validate GWAS results. The closest possible is NIH-led research program “All Of Us”, which aims to collect data on 1 million people, which unfortunately is not available to for-profit companies. It is theoretically possible to rebuild a clock only using the small number of phenotypes present in both datasets with the goal of training it on one dataset and test-applying it to another, but that approach would not ultimately be informative about the accuracy of the complete physiological clock presented here. We hope academic labs will utilize our clock approach and apply it to datasets currently unavailable to us and publish their findings. For the detailed response on this issue, please see the response to the second comment of the first reviewer above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific questions/suggestions:<br /> - It looks like the ages of participants are enriched around 60 years (Fig. 1, Fig 3b). Can authors clarify whether age distribution affects the correlation tests (e.g. correlation in Fig 2)?

      Indeed, the distribution of people by age is enriched by 60–65-year-olds and is depleted at younger and older ages. Such a distribution influences the uncertainty of correlations that we compute, with error bars being larger for 40- and 70-year-olds and lower for 50- and 60-year-olds. The example of this can be seen on figure 1F. Figures 2a,b,g,h mostly deal with the correlation of phenotypes with each other and thus are not influenced by age. For other computations, such age prediction, it is theoretically possible that if age determinants among 65-year-olds differ from those for 40- or 80-year-olds, the calculated contributions would be skewed to increase accuracy in the middle of distribution at the expense of the ends. ∆Age, however, was explicitly normalized for each age cohort (Fig. 3a) to avoid “birth cohort” bias, therefore minimizing the effect of uneven distribution on further analysis, such as GWAS. We now acknowledge and describe this feature of UKBB dataset in the first paragraph of the “Results” section.

      - Phenotypic variation usually increases during aging. However, the authors showed that delta-age and age are not correlated (Figure 3a), suggesting that biological variation does not increase during aging in their analysis. Can authors provide more evidence supporting their findings? Is this phenomenon affected by their normalization method?

      Thank you for this comment. We find that there is no strict rule for phenotypic variation change with age. Certain phenotypes, such as blood pressure (Fig. 1a) or SHGB (Fig. 1d), indeed increase in variation with advanced age, however many others, such as grip strength (Fig. 1b) and BMI do not change in variation, and certain phenotypes even decrease their variation with age. As we stated above, in order to minimize the possible effect of “birth cohort” bias on subsequent analysis, as well as uneven distribution of people across ages, ∆Age was normalized per age cohort. Additionally, purifying selection likely also limits how far most physiological factors can deviate. For example, people with too high or too low blood pressures would simply perish, which would limit continuous increase in variation. 

      - Authors correlate GWAS data with delta-age (Figure 4). It would be important to show whether the delta-age from young and old participants correlates with GWAS patterns in a similar manner. If not, the authors have to consider how age differences affect delta-age and the GWAS correlation. For example, the authors mentioned that APOE genotype influences age-delta even in the 40-year-old group (Figure 4f). If the APOE genotype already shows high delta-age in the 40-year-old group, how does aging affect the delta-age distribution?

      Thank you for this comment. It is an interesting question to understand how age influences GWAS hits identified through ∆Age. At the same time, one must remember that our dataset is cross-sectional in nature and “different age” in reality is a subset of different people, which lived in different times with different exposures to environments and different standards of medical care (which are evolving over time). We specifically attempted to factor age and this “cohort effect” out of our analysis and presented Figure 4f simply as an illustration that APOE variants seem to influence human aging at any age, which challenges the theory proposed by previous studies that APOE is implicated in aging simply because APOE4 carriers likely die from Alzheimer disease and are thus excluded from the oldest cohorts. To investigate the question raised by the reviewer it is possible to do GWAS on age, however one must keep in mind the limitations associated with interpreting those results; as “age” in reality (in this cross-sectional cohort) also represents changes in population composition, changes in the environment, food quality, early life care, medical care, social habits, and other parameters associated with changing society.

      - For the discussion part, it would be great if the authors could add one section to provide guidelines for future human and lab animal studies based on observations from the current study. For example, what physiological traits are most useful, and what can be further added when collecting human data?

      Thank you for the great suggestion. We now propose and discuss certain experiments that can be performed in humans and animals to better differentiate between drivers and markers of aging.

      - In line 479, I found the statement "It is possible that synapse function accounts for the association of computer gaming with ΔAge" came from nowhere, and suggest removing it.

      Done—thank you.

      - Minor. Line 155. Is it a wrong citation of table S2c, 2d as there are only 2a and 2b?<br />

      Thank you, corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) Between lines 300-305, there is a missing reference to Figure 3e.

      Thank you, corrected.

      (2) For Figures 4a and 4c, please add the lambda statistic to the QQ plots.

      Thank you, we have added lambda inflation factors to the QQ plots.

      (3) In line 384, the p-value cut-off is mentioned as 10-9. However, this does not seem to be consistently represented in Figures 4b and 4d, where the gray lines do not align with this threshold. Please adjust these figures to accurately reflect the mentioned p-value cut-off.

      Thank you, corrected.

      (4) Clarification for Figure 5a. Add titles and correlation coefficients to Figure 5a to clearly define what the clusters represent. Please also add a discussion to explain why the cluster 10 (general health) dropout model can affect ∆Age compared to the full model, with some individuals showing a 5-year difference. Furthermore, despite the substantial effect of removing cluster 10 on ΔAge, all the top loci remain unchanged in terms of effect sizes and p-values compared to the full model.

      We have added the titles and correlation coefficients to the Figure 5a. Thank you for these suggestions, it makes the presentation of data much clearer. It is an interesting observation that whereas dropping out cluster 10 resulted in quite significant changes of ∆Age distribution, the genetic signature as determined by GWAS did not change much. The most obvious explanation is that many parameters in this category are influenced by environment more than by genetics, therefore genetic signature did not change much after the cluster removal. We now mention this observation in the text. Specifically, in the subsection “Cluster-dropout analysis enriches for GWAS hits that influence aging globally”, we added the following text: “Another interesting observation is that degree by which certain cluster contributes to the model does not necessarily correlate with how much this cluster contributes to genetic signature of human aging. For example, while dropping out cluster 10 (General Health) resulted in quite significant changes of ∆Age distribution (R<sup>2</sup>=0.88), the genetic signature as determined by GWAS did not change substantially. The most likely explanation is that many parameters in this category are influenced by environment more strongly than by genetics; for example, not as much as caused by cluster 1 (muscle-related) removal.”

      (5) Discussion on drivers and markers. Given the theoretical nature of the study, it would be beneficial to propose potential experimental validations for your findings. Even if these validations have not been performed, suggesting them would greatly enhance the value of the discussion.

      Thank you, it is a great idea. We now propose and discuss certain experiments that can be performed in humans and animals to better differentiate between drivers and markers of aging. Specifically, in the subsection “Cluster-dropout analysis enriches for GWAS hits that influence aging globally”, we added the following text: “To definitively distinguish whether a gene is a driver or a marker of aging, an experiment would need to be performed. It is possible that certain gene activities are influenced by existing FDA-approved medications, and retrospective analyses of human cohorts who take certain medications can be performed. More likely, however, an animal model would need to be employed, where animals with candidate genes modified via genetic means are investigated for lifespan and onset and progression of age-associated conditions. For example, one can engineer a mouse with a conditional allele of Cystatin-C and evaluate how changes in dosage of this protein influence various phenotypes of aging.”

    1. eLife Assessment

      This potentially useful study introduces an orthogonal approach for detecting RNA modification, without chemical modification of RNA, which often results in RNA degradation and therefore loss of information. Compared to previous versions, the most recent one is improved and sufficiently aligned with the standards of the field to merit consideration by the research community, making the evidence solid according to said standards. Nevertheless, uncertainty regarding false positive and false negative rates remains, as it does for some of the alternative approaches. With more rigorous validation, the approach might be of particular interest for sites in RNA molecules where modifications are rare.

    2. Reviewer #2 (Public review):

      The fledgling field of epitranscriptomics has encountered various technical roadblocks with implications as to the validity of early epitranscriptomics mapping data. As a prime example, the low specificity of (supposedly) modification-specific antibodies for the enrichment of modified RNAs, has been ignored for quite some time and is only now recognized for its dismal reproducibility (between different labs), which necessitates the development of alternative methods for modification detection. Furthermore, early attempts to map individual epitranscriptomes using sequencing-based techniques are largely characterized by the deliberate avoidance of orthogonal approaches aimed at confirming the existence of RNA modifications that have been originally identified.

      Improved methodology, the inclusion of various controls, and better mapping algorithms as well as the application of robust statistics for the identification of false-positive RNA modification calls have allowed revisiting original (seminal) publications whose early mapping data allowed making hyperbolic claims about the number, localization and importance of RNA modifications, especially in mRNA. Besides the existence of m6A in mRNA, the detectable incidence of RNA modifications in mRNAs has drastically dropped.

      As for m5C, the subject of the manuscript submitted by Zhou et al., its identification in mRNA goes back to Squires et al., 2012 reporting on >10.000 sites in mRNA of a human cancer cell line, followed by intermittent findings reporting on pretty much every number between 0 to > 100.000 m5C sites in different human cell-derived mRNA transcriptomes. The reason for such discrepancy is most likely of a technical nature. Importantly, all studies reporting on actual transcript numbers that were m5C-modified relied on RNA bisulfite sequencing, an NGS-based method, that can discriminate between methylated and non-methylated Cs after chemical deamination of C but not m5C. RNA bisulfite sequencing has a notoriously high background due to deamination artifacts, which occur largely due to incomplete denaturation of double-stranded regions (denaturing-resistant) of RNA molecules. Furthermore, m5C sites in mRNAs have now been mapped to regions that have not only sequence identity but also structural features of tRNAs. Various studies revealed that the highly conserved m5C RNA methyltransferases NSUN2 and NSUN6 do not only accept tRNAs but also other RNAs (including mRNAs) as methylation substrates, which in combination account for most of the RNA bisulfite-mapped m5C sites in human mRNA transcriptomes. Is m5C in mRNA only a result of the Star activity of tRNA or rRNA modification enzymes, or is their low stoichiometry biologically relevant?

      In light of the short-comings of existing tools to robustly determine m5C in transcriptomes, other methods, like DRAM-seq, aiming to map m5C independently of ex situ RNA treatment with chemicals, are needed to arrive at a more solid "ground state", from which it will be possible to state and test various hypotheses as to the biological function of m5C, especially in lowly abundant RNAs such as mRNA.

      Importantly, the identification of >10.000 sites containing m5C increases through DRAM-Seq, increases the number of potential m5C marks in human cancer cells from a couple of 100 (after rigorous post-hoc analysis of RNA bisulfite sequencing data) by orders of magnitude. This begs the question, whether or not the application of these editing tools results in editing artefacts overstating the number of actual m5C sites in the human cancer transcriptome.

      [Editors' note: earlier reviews have been provided here: https://doi.org/10.7554/eLife.98166.3.sa1; https://doi.org/10.7554/eLife.98166.2.sa1; https://doi.org/10.7554/eLife.98166.1.sa1]

    1. eLife Assessment

      The research presents valuable findings on the impact of FRMD8 loss on tumor progression and resistance to tamoxifen therapy. Through a series of convincing and systematic experiments, the author thoroughly investigates the role of FRMD8 in breast cancer and its underlying regulatory mechanisms. The study confirms that FRMD8 holds potential as a therapeutic target for reversing tamoxifen resistance, offering helpful insights for future treatment strategies.

    2. Reviewer #1 (Public review):

      Summary:

      Tamoxifen resistance is a common problem in partially ER-positive patients undergoing endocrine therapy, and this manuscript has important research significance as it is based on clinical practical issues. The manuscript discovered that the absence of FRMD8 in breast epithelial cells can promote the progression of breast cancer, thus proposing the hypothesis that FRMD8 affects tamoxifen resistance and validated this hypothesis through a series of experiments. The manuscript has certain theoretical reference value.

      Strengths:

      At present, research on the role of FRMD8 in breast cancer is very limited. This manuscript leverages the MMTV-Cre+;Frmd8fl/fl;PyMT mouse model to study the role of FRMD8 in tamoxifen resistance, and single-cell sequencing technology discovered the interaction between FRMD8 and ESR1. At the mechanistic level, this manuscript has demonstrated two ways in which FRMD8 affects ERα, providing some new insights into the development of ER-positive breast cancer in patients who are resistant to tamoxifen.

      Limitations:

      Whether FRMD8 can become a biomarker should be verified in large clinical samples or clinical data.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript presents a valuable finding on the impact of FRMD8 loss on tumor progression and the resistance to tamoxifen therapy. The author conducted systematic experiments to explore the role of FRMD8 in breast cancer and its potential regulatory mechanisms, confirming that FRMD8 could serve as a potential target to revere tamoxifen resistance.

      The research is logically coherent and persuasive. The results support their conclusions and have achieved the research objectives.

    1. eLife Assessment

      This study presents valuable findings on the control of survival and maintenance of a specific set of brain resident immune cells. The authors generate a new animal model to enable sophisticated analysis of cell function in vivo. The sophisticated knock-in/knock-out alleles are compelling, although the work would ultimately be strengthened with further mechanistic analyses.

    2. Reviewer #1 (Public review):

      Summary:

      The article entitled "Pu.1/Spi1 dosage controls the turnover and maintenance of microglia in zebrafish and mammals" by Wu et al., identifies a role for the master myeloid developmental regulator Pu.1 in the maintenance of microglial populations in the adult. Using a non-homologous end joining knock-in strategy, the authors generated a pu.1 conditional allele in zebrafish, which reports wildtype expression of pu.1 with EGFP and truncated expression of pu.1 with DsRed after Cre-mediated recombination. When crossed to existing pu.1 and spi-b mutants, this approach allowed the authors to target a single allele for recombination and induce homozygous loss-of-function microglia in adults. This identified that although there is no short-term consequence to loss of pu.1, microglia lacking any functional copy of pu.1 are depleted over the course of months, even when spi-b is fully functional. The authors go on to identify reduced proliferation, increased cell death, and higher expression of tp53 in the pu.1 deficient microglia, as compared to the wild-type EGFP+ microglia. To extend these findings to mammals, the authors generated a conditional Pu.1 allele in mice and performed similar analyses, finding that loss of a single copy of Pu.1 resulted in similar long-term loss of Pu.1-deficient microglia. The conclusions of this paper are overall well supported by the data.

      Strengths:

      The genetic approaches here for visualizing the recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or nonexistent without the point of comparison and competition with the wildtype cells.

      Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.

      The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors' argument.

      Weaknesses:

      This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.

    3. Reviewer #2 (Public review):

      Summary:

      In the presented work by Wu et al, the authors investigate the role of the transcription factor Pu.1 in the survival and maintenance of microglia, the tissue-resident macrophage population in the brain. To this end, they generated a sophisticated new conditional pu.1 allele in zebrafish using CRISPR-mediated genome editing which allows visual detection of expression of the mutant allele through a switch from GFP to dsRed after Cre-mediated recombination. Using EdU pulse-chase labelling, they first estimated the daily turnover rate of microglia in the adult zebrafish brain which was found to be higher than rates previously estimated for mice and humans. After conditional deletion of pu.1 in coro1a positive cells, they do not find a difference in microglia number at 2 and 8 days or 1-month post-injection of Tamoxifen. However, at 3 months post-injection, a strong decrease in mutant microglia could be detected. While no change in microglia number was detected at 1mpi, an increase in apoptotic cells and decreased proliferation as observed. RNA-seq analysis of WT and mutant microglia revealed an upregulation of tp53, which was shown to play a role in the depletion of pu.1 mutant microglia as deletion in tp53-/- mutants did not lead to a decrease in microglia number at 3mpi. Through analysis of microglia number in pU.1 mutants, the authors further show that the depletion of microglia in the conditional mutants is dependent on the presence of WT microglia. To show that the phenomenon is conserved between species, similar experiments were also performed in mice.

      This work expands on previous in vitro studies using primary human microglia. The majority of conclusions are well supported by the data, addition of controls and experimental details would strengthen the conclusions and rigor of the paper.

      Strengths:

      Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.

      The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.

      Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.

      Weaknesses:

      (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed are missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further, the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Figure S7A).

      (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Figure 2E). A control of pu.1 KI/d839 mutants treated with 4-OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in the requirement of pu.1 in embryonic and adult stages.

      (3) The number of microglia after pu.1 knockout in zebrafish did only show a significant decrease 3 months after 4-OHT injection, whereas microglia were almost completely depleted already 7 days after injection in mice. This major difference is not discussed in the paper.

      (4) Data is represented as mean +/-.SEM. Instead of SEM, standard deviation should be shown in all graphs to show the variability of the data. This is especially important for all graphs where individual data points are not shown. It should also be stated in the figure legend if SEM or SD is shown.

    1. eLife Assessment

      Du et al. present a valuable study on neural activation in medial prefrontal cortex (mPFC) subpopulations projecting to the basolateral amygdala (BLA) and nucleus accumbens (NAc) during behavioral tasks assessing anxiety, social preference, and social dominance. The study has innovative approaches and solid in vivo calcium imaging data, but the evidence linking neural physiology to behavioral outcomes is incomplete. Addressing these gaps would significantly enhance the understanding of how distinct mPFC→BLA and mPFC→NAc pathways influence anxiety, exploration, and social behaviors.

    2. Reviewer #1 (Public review):

      Summary:

      It is well known that neurons in the medial prefrontal cortex (mPFC) are involved in higher cognitive functions such as executive planning, motivational processing, and internal state-mediated decision-making. These internal states often correlate with the emotional states of the brain. While several studies point to the role of mPFC in regulating behavior based on such emotional states, the diversity of information processing in its sub-populations remains a less explored territory. In this study, the authors try to address this gap by identifying and characterizing some of these sub-populations in mice using a combination of projection-specific imaging, function-based tagging of neurons, multiple behavioral assays, and ex-vivo patch clamp recordings.

      Strengths:

      The authors targeted mPFC projections to the nucleus accumbens (NAc) and basolateral amygdala (BLA). Using the open field task (OFT), the authors identified four relevant behavioral states as well as neurons active while the animal was in the center region ("center-ON neurons"). By characterizing single-unit activity and using dimensionality reduction, the authors show differentiated coding of behavioral events at both the projection and functional levels. They further substantiate this effect by showing higher sensitivity of mPFC-BLA center-ON neurons during time spent in the open arms of the elevated plus maze (EPM). The authors then pivoted to the three-chamber social interaction (SI) assay to show the different subsets of neurons encode preference for social stimulus over non-social. This reveals an interesting diversity in the function of these sub-populations on multiple levels. Lastly, the authors used the tube test as a manipulation of the anxiety state of mice and compared behavioral differences before/after the OFT and social interaction tasks. This experiment revealed that "losers" of the tube test spend less time in the center of the open field while "winners" show a stronger preference for the familiar mouse over the object. Using patch-clamp experiments, the authors also found that "winners" exhibit stronger synaptic transmission in the mPFC-NAc projection while "losers" exhibit stronger synaptic transmission in the mPFC-BLA projection. Given the popularity of the tube test assay in rank determination, this provides useful insights into possible effects on anxiety levels and synaptic plasticity. Overall, the many experiments performed by the authors reveal interesting differences in mPFC neurons relative to their involvement in high or low anxiety behaviors, social preference, and social rank.

      Weaknesses:

      The authors focused primarily on female mice without commenting on the effect that sex differences would have on their results. While the authors have identified relevant behavioral states across the various behavioral tasks, there is still a missing link between them and "emotional states" - the phrase used by them emphatically throughout the manuscript. The authors have neither provided adequate references to satisfy this gap nor shared any data pertaining to relevant readouts such as cortisol levels. Both the projection-specific recordings and patch-clamp experiments, including histology reports in the manuscript, would provide essential information for anyone trying to replicate the results, especially since it's known that sub-populations in the BLA and NAc can have vastly different functions. The population-level analysis in the manuscript requires more rigor to reduce bias and statistical controls for establishing the significance of their results. Lastly, the tube test is used as a manipulation of the "emotional state" in several of the experiments. While the tube test can cause a temporary spike in anxiety of the participating mice, it is not known to produce a sustained effect - unless there are additional interventions such as forced social defeat. Thus, additional controls for these experiments are essential to support claims based on changes in the emotional state of mice. Apart from the methodology, the manuscript could also be improved with the addition of clear scatter points in all the plots along with detailed measures of the statistical tests such as exact p values and size of groups being compared.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this proposal was to understand how two separate projection neurons from the medial prefrontal cortex, those innervating the basolateral amygdala (BLA ) and nucleus accumbens (NAc), contribute to the encoding of emotional behaviors. The authors record the activity of these different neuron classes across three different behavioral environments. They propose that, although both populations are involved in emotional behavior, the two populations have diverging activity patterns in certain contexts. A subset of projections to the NAc appears particularly important for social behavior. They then attempt to link these changes to the emotional state of the animal and changes in synaptic connectivity.

      Strengths:

      The behavioral data builds on previous studies of these projection neurons supporting distinct roles in behavior and extend upon previous work by looking at the heterogeneity within different projection neurons across contexts.

      Weaknesses:

      The diversity of neurons mediating these projections and their targeting within the BLA and NAc is not explored. These are not homogeneous structures and so one possibility is that some of the diversity within their findings may relate to targeting of different sub-structures within each region. The electrophysiological data have significant experimental confounds and more methodological information is required to support other conclusions related to these data.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the distinct contributions of mPFC→BLA and mPFC→NAc pathways in emotional regulation, with implications for understanding anxiety, exploration, and social preference behaviors. Using Ca2+ imaging, optogenetics, and patch-clamp recording, the authors demonstrate pathway-specific roles in encoding emotional states of opposite valence. They further identify subsets of neurons ("center-ON") with heightened activity under anxiety-inducing conditions. These findings challenge the traditional view of functional similarity between these pathways and provide valuable insights into neural circuit dynamics relevant to emotional disorders.

      The study is well-designed and addresses an important topic, but several methodological and interpretational issues require clarification to strengthen the conclusions.

      Weaknesses:

      Major Weaknesses:

      (1) The manuscript does not clearly and consistently specify the sex of the mice used for behavioral and imaging experiments. Given the known influence of sex on emotional behaviors and neural activity, this omission raises concerns about the generalizability of the findings. The authors should make clear throughout the manuscript whether male, female, or mixed-sex cohorts were used and provide a rationale for their choice. If only one sex was used, the potential limitations of this approach should be explicitly discussed.

      (2) Mice lacking "center-ON" neurons were excluded from analysis, yet the manuscript draws broad conclusions about the encoding of emotional states by mPFC pathways. It is critical to justify this exclusion and discuss how it may limit the generalizability of the findings. The inclusion of data or contextualization for animals without center-ON neurons would strengthen the interpretation.

      (3) The manuscript lacks baseline activity comparisons for mPFC→BLA and mPFC→NAc pathways across subjects. Providing baseline data would contextualize the observed activity changes during behavior testing and help rule out inter-individual variability as a confounding factor.

      (4) Extensive behavioral testing across multiple paradigms may introduce stress and fatigue in the animals, which could confound the induction of emotional states. The authors should describe the measures taken to minimize these effects (e.g., recovery periods, randomized testing order) and discuss their potential impact on the results.

      (5) Grooming is described as a "non-anxiety" behavior, which conflicts with its established role as a stress-relieving behavior that may indicate anxiety. This discrepancy requires clarification, as the distinction is central to the conclusions about the mPFC→BLA pathway's role in differentiating anxiety-related and non-anxiety behaviors.

      (6) While the study highlights pathway-specific neural activity, it lacks a cohesive integration of these findings with the behavioral data. Quantifying the overlap or decorrelation of neuronal activity patterns across tasks would solidify claims about the specialization of mPFC→NAc and mPFC→BLA pathways. Likewise, the discussion should be expanded to place these findings in light of prior studies that have probed the roles of these pathways in social/emotion/valence-related behaviors.

      Minor Weaknesses:

      (1) The manuscript does not explicitly state whether the same mice were used across all behavioral assays. This information is critical for evaluating the validity of group comparisons. Additionally, more detail on sample sizes per assay would improve the manuscript's transparency.

      (2) In Figure 2G, the difference between BLA and NAc activity during exploratory behaviors (sniffing) is difficult to discern. Adjusting the scale or reformatting the figure would better illustrate the findings.

      (3) While the characteristics of the first social stimulus (M1) are specified, there is no information about the second social stimulus (M2). This omission makes it difficult to fully interpret the findings from the three-chamber test.

      (4) The methods section lacks detailed information about statistical approaches and animal selection criteria. Explicitly outlining these procedures would improve reproducibility and clarity.

    5. Author response:

      Reviewing editor comments:

      Overall, the reviewers found the imaging data to be strong but identified the physiology experiments as the weakest aspect of the study. Please consider either removing Figures 7 and 8 from the manuscript or significantly revising the data. If you choose to revise these figures, refer to the specific reviewer comments addressing them. Additionally, several reviewers noted that the prior literature was not adequately cited, so please consider addressing this concern.

      As noted below, we will work to strengthen the physiological side of the study and ensure that we are more scrupulous in citing the prior literature. Below we summarize the major concerns of each reviewer and outline our proposed response.

      Reviewer #1:

      (1) Sex differences and generalizability

      Various studies have shown sex differences in emotional responses and neural activity in mice, but to study both male and female mice would have required much larger numbers of mice than we could accommodate for practical reasons, so we chose to use only female mice to lay a solid foundation for future studies that compare (and perhaps contrast) males.

      We will:

      Make clear in the main text that we used only females.

      Cite literature on sex-specific mPFC-BLA/NAc functions in the Discussion.

      (2) Missing link between behavioral states and "emotional states"...relevant readouts such as cortisol

      We appreciate the reviewer pointing out this inadvertent conceptual slippage. We will:

      Include corticosterone measurements using an ELISA kit from archived plasma samples (collected before and after OFT/EPM tests) to correlate with behavioral and neural activity (approach refers to Panczyszyn-Trzewik et al., Steroids, 2024).

      Be more precise in our language to differentiate behavioral correlates from inferred emotional states.

      Carefully review the literature on OFT center time, EPM open-arm exploration, and tube test outcomes as anxiety/social hierarchy indicators and decide the best interpretation for our findings.

      (3) Improve methodological detail and rigor of population-level analysis

      We will:

      Expand the methods section with electrophysiology parameters (e.g., access resistance criteria, stimulus protocols).

      Add detailed histology figures (viral targeting, electrode placements) for mPFC-BLA/NAc projections.

      Include raw data points in all plots and report exact p-values, effect sizes, and group sizes (e.g., n = 12 cells from 4 mice).

      To enhance statistical rigor, we will provide clearer scatter plots with individual data points, report exact p-values, and specify group sizes in all figures.

      (4) Acute vs. sustained effects after tube test and additional controls

      We would like to clarify that we used repeated tube tests (3 times a day and continuing for 7 days) for assessing sustained rank effects. To address concerns about sustained emotional state changes post-tube test, we will:

      Assess corticosterone levels pre/post-tube test (approach refers to Panczyszyn-Trzewik et al., Steroids, 2024).

      Discuss the transient nature of hierarchy effects and cite studies using repeated tube tests for sustained rank effects.

      Reviewer #2:

      (1) Sub-region targeting in BLA/NAc

      Although different subregions within the BLA and NAc receive distinct inputs and exhibit diverse functions, comparing neuronal activity across these subregions is beyond the scope of this paper. Our primary focus is on mPFC projections, emphasizing presynaptic activity rather than postsynaptic activity within the NAc and BLA. We focused on the PL-NAc shell and PL-BLA (BA) regions because PL-to-NAc shell projections in mice are well-documented, particularly in studies utilizing viral tracers and optogenetic tools (Britt et al., Neuron, 2012; Bossert et al., J. Neurosci., 2012). These projections regulate aversive behaviors, stress responses, and motivational states and are implicated in drug-seeking behaviors and emotional valence encoding (Jocelyn & Berridge, Biol. Psychiatry, 2013; Fetcho et al., Nat. Commun., 2023; Capuzzo & Floresco, J. Neurosci., 2020; Xie et al., BioRxiv., 2025; Domingues et al., Nat Commun., 2025). The PL-BLA projection in turn sends topographically organized projections to BLA subregions, primarily targeting the basal (BA) nuclei of the BLA (McGarry & Carter, J. Neurosci., 2016; Hoover & Vertes, Brain Struct. Funct., 2007). Both the recorded NAc shell and BLA subregions are involved in emotional valence encoding.

      A detailed comparison of neuronal activity across different NAc shell and BLA subregions or comparing different cell types, such as NAc shell D1- and D2-medium spiny neurons, could each be the subject of a whole other study. Nevertheless,

      We will discuss how sub-region connectivity could contribute to observed heterogeneity in the discussion, citing relevant studies, and make sure we clarify our rationale for our experimental design.

      (2) Electrophysiological confounds

      To strengthen the rationale for our patch-clamp recordings, we will:

      Clarify in methods that recordings were performed in acute slices from behaviorally naive mice (post-tube test) to isolate synaptic changes.

      Include access resistance and cell health criteria (e.g., resting membrane potential, input resistance ranges), along with precise optogenetic stimulus protocols.

      Add example traces of mEPSCs/mIPSCs and quantify exclusion rates.

      Reviewer #3:

      (1) Specify the sexes used throughout the manuscript.

      We will make this clear throughout the paper.

      (2) Exclusion of mice lacking "center-ON" neurons

      We will:

      Explain the exclusion of mice that lacked center-ON neurons. We will also discuss the potential interpretations (e.g., floor effects in anxiety tasks) in the limitations section.

      (3) Baseline activity comparisons

      We will:

      Add baseline neuronal activity comparison between mPFC-BLA and mPFC-NAc neurons.

      (4) Stress from repeated behavioral testing

      We will:

      Clarify our experimental design to state how we tried to minimize the stress caused by multiple behavioral assays.

      Include pre-test habituation protocols in methods.

      Discuss potential cumulative stress effects in limitations.

      (5) Grooming classification

      While the reviewer is correct that grooming can be a stress-relieving behavior, it also obviously has many other functions, from the pragmatic to the social. In our study grooming occurred primarily in the periphery of the open field test, where it was exhibited as a behavior corresponding to neural activity patterns that differed from that which occurred in the center. As we classify the behavior in the center zone of the open field test as anxiety-like, we interpreted the peripheral grooming as indicative of the animal's adjustment to a novel environment, as suggested by previous work (Estanislau et al., Neurosci. Res., 2013; Rojas-Carvajal et al., Animal Behaviour, 2018). The nature of the grooming was primarily rostral body-licking, which accords with what Rojas-Carvajal et al. calls a “de-arousal inhibition system” that subserves novelty habituation. The duration and nature of this behavior are, interestingly enough, influenced by whether the mouse or rat lived in an enriched environment prior to the OFT (enriched environments made them quicker to explore a new environment but also quicker to get bored - no surprise, really).

      We did not explain any of this in the manuscript, however, so in our revision, we will make sure to discuss these nuances and cite the relevant literature.

      (6) Integrate neuronal activity and behavioral data

      We will:

      Include additional analyses quantifying neuronal activity overlap across tasks and refine our Discussion to better integrate these findings with prior literature.

      Perform cross-correlation analyses to quantify activity overlap between OFT, EPM, and SI tasks.

      Minor weaknesses

      - Clarify the cohorts of mice that were used for each behavioral assay.

      - Adjust Figure 2G scale and add insets to highlight sniffing differences.

      - Specify that M1/M2 were age-/sex-matched unfamiliar mice in the three-chamber test.

      - Detail statistical tests (e.g., mixed-effects models) and animal selection criteria in methods.

      We believe these revisions will address the reviewers’ major concerns and significantly improve the manuscript. We welcome further feedback on these plans and will provide updated figures/data for the resubmission.

    1. eLife Assessment

      The authors of this important study investigate how telomere length regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter, while short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. There is convincing support for the claims and the findings should be of broad interest for cell biologists and those working in fields where telomeres alter function, such as cancer and aging.

    2. Reviewer #1 (Public review):

      Summary:

      The authors in this study extensively investigate how telomere length (TL) regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter. In contrast, short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. The study presents several significant findings revealing a previously unknown mechanism of hTERT regulation by TRF2 in a TL-dependent manner

      Strengths:

      (1) A previously unknown mechanism linking telomere length and hTERT regulation through the non-telomeric TRF2 protein has been established strengthening the telomere biology understanding.

      (2) The authors used both cancer cell lines and iPSCs to showcase their hypothesis and multiple parameters to validate the role of TRF2 in hTERT regulation.

      (3) Comprehensive integration of the recent literature findings and implementation in the current study.

      (4) In vivo validation of the findings.

      (5) Rigorous controls and well-designed assays have been use.

      Weaknesses:

      (1) The authors should comment on the cell proliferation and morphology of the engineered cell lines with ST or LT.

      (2) Also, the entire study uses engineered cell lines, with artificially elongated or shortened telomeres that conclusively demonstrate the role of hTERT regulation by TRF2 in telomere-length dependent manner, but using ALT negative cell lines with naturally short telomere length vs those with long telomeres will give better perspective. Primary cells can also be used in this context.

      (3) The authors set up time-dependent telomere length changes by dox induction, which may differ from the gradual telomere attrition or elongation that occurs naturally during aging, disease progression, or therapy. This aspect should be explored.

      (4) How does the hTERT regulation by TRF2 in a TL-dependent manner affect the ETS binding on hTERT mutant promoter sites?

      (5) Stabilization of the G-quadruplex structures in ST and LT conditions along with the G4 disruption experimentation (demonstrated by the authors) will strengthen the hypothesis.

      (6) The telomere length and the telomerase activity are not very consistent (Figure 2A, and S1A, Figure 4B and S3). Please comment.

      (7) Please comment on the other telomere-associated proteins or regulatory pathways that might contribute to hTERT expression based on telomere length.

    3. Reviewer #2 (Public review):

      Summary:

      Telomeres are key genomic structures linked to everything from aging to cancer. These key structures at the end of chromosomes protect them from degradation during replication and rely on a complex made up of human telomerase RNA gene (hTERC) and human telomerase reverse transcriptase (hTERT). While hTERC is expressed in all cells, the amount of hTERT is tightly controlled. The main hypothesis being tested is whether telomere length itself could regulate the hTERT enzyme. The authors conducted several experiments with different methods to alter telomere length and measured the binding of key regulatory proteins to this gene. It was generally observed that the shortening of telomere length leads to the recruitment of factors that reduce hTERT expression and lengthening of telomeres has the opposite effect. To rule out direct chromatin looping between telomeres and hTERT as driving this effect artificial constructs were designed and inserted a significant distance away and similar results were obtained.

      Overall, the claims of telomere length-dependent regulation of hTERT are supported throughout the manuscript.

      Strengths:

      The paper has several important strengths. Firstly, it uses several methods and cell lines that consistently demonstrate the same directionality of the findings. Secondly, it builds on established findings in the field but still demonstrates how this mechanism is separate from that which has been observed. Specifically, designing and implementing luciferase assays in the CCR5 locus supports that direct chromatin looping isn't necessary to drive this effect with TRF2 binding. Another strength of this paper is that it has been built on a variety of other studies that have established principles such as G4-DNA in the hTERT locus and TRF2 binding to these G4 sites.

      Weaknesses:

      The largest technical weakness of the paper is that minimal replicates are used for each experiment. I understand that these kinds of experiments are quite costly, and many of the effects are quite large, however, experiments such as the flow cytometry or the IPSC telomere length and activity assays appear to be based on a single sample, and several are based upon two maximum three biological replicates. If samples were added the main effects would likely hold, and many of the assays using GAPDH as a control would result in significant differences between the groups. This unnecessarily weakens the strength of the claims.

      Another detail that weakens the confidence in the claims is that throughout the manuscript there are several examples of the control group with zero variance between any of the samples: e.g. Figure 2K, Figure 3N, and Figure 6G. It is my understanding that a delta delta method has been used for calculation (though no exact formula is reported and would assist in understanding). If this is the case, then an average of the control group would be used to calculate that fold change and variance would exist in the group. The only way I could understand those control group samples always set to 1 is if a tube of cells was divided into conditions and therefore normalized to the control group in each case. A clearer description in the figure legend and methods would be required if this is what was done and repeated measures ANOVA and other statistics should accompany this.

      A final technical weakness of the paper is the data in Figure 5 where the modified hTERT promoter was inserted upstream of the luciferase gene. Specifically, it is unclear why data was not directly compared between the constructs that could and could not form G4s to make this point. For this reason, the large variance in several samples, and minimal biological replicates, this data was the least convincing in the manuscript (though other papers from this laboratory and others support the claim, it is not convincing standalone data).

      The second largest weakness of the paper is formatting.

      When I initially read the paper without a careful reading of the methods, I thought that the authors did not have appropriate controls meaning that if a method is applied to lengthen, there should be one that is not lengthened, and when a method is applied to shorten, one which is not shortened should be analysed as well. In fact, this is what the authors have done with isogenic controls. However, by describing all samples as either telomere short or telomere long, while this simplifies the writing and the colour scheme, it makes it less clear that each experiment is performed relative to an unmodified. I would suggest putting the isogenic control in one colour, the artificially shortened in another, and the artificially lengthened in another.

      Similarly, the graphs, in general, should be consistent with labelling. Figure 2 was the most confusing. I would suggest one dotted line with cell lines above it, and then the method of either elongation or shortening below it. I.e. HT1080 above, hTERC overexpression below, MDAMB-231 above guanine terminal repeats below, like was done on the right. Figure 2 readability would also be improved by putting hTERT promoter GAPDH (-ve control) under each graph that uses this (Panel B and Panel C not just Panel C). All information is contained in the manuscript but one must currently flip between figure legends, methods, and figures to understand what was done and this reduces clarity for the reader.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors in this study extensively investigate how telomere length (TL) regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter. In contrast, short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. The study presents several significant findings revealing a previously unknown mechanism of hTERT regulation by TRF2 in a TL-dependent manner

      Strengths:

      (1) A previously unknown mechanism linking telomere length and hTERT regulation through the non-telomeric TRF2 protein has been established strengthening the telomere biology understanding.

      (2) The authors used both cancer cell lines and iPSCs to showcase their hypothesis and multiple parameters to validate the role of TRF2 in hTERT regulation.

      (3) Comprehensive integration of the recent literature findings and implementation in the current study.

      (4) In vivo validation of the findings.

      (5) Rigorous controls and well-designed assays have been use.

      Weaknesses:

      (1) The authors should comment on the cell proliferation and morphology of the engineered cell lines with ST or LT.

      The cell proliferation and morphology of the engineered cells were monitored during experiments. With a doubling time within 16-18 hours, all the cancer cell line pairs used in the study were counted and seeded equally before experiments.

      No significant difference in morphology or cell count (before harvesting for experiments) was noted for the stable cell lines, namely, HT1080 ST-HT1080 LT, HCT116 p53 null scrambled control-HCT116 p53 null hTERC knockdown.

      MDAMB 231 cells which were treated with guanine-rich telomere repeats (GTR) over a period of 12 days, as per the protocol mentioned in Methods. Due to the alternate day of GTR treatment in serum-free media followed by replenishment with serum-supplemented media, we noted that cells would undergo periodic delay in their proliferation (or transient arrest) aligning with the GTR oligo-feeding cycles and appeared somewhat larger in comparison to their parental untreated cells.

      Next, the cells with Cas9-telomeric sgRNA mediated telomere trimming were maintained transiently (till 3 days after transfection). During this time, no significant change in morphology or cell proliferation was observed in any of the cell lines, namely HCT116 or HEK293T Gaussia Luciferase reporter cells. iPSCs were also monitored. However, no change in morphology or cellular proliferation was observed during the 5 days post-transfection and antibiotic selection.  

      (2) Also, the entire study uses engineered cell lines, with artificially elongated or shortened telomeres that conclusively demonstrate the role of hTERT regulation by TRF2 in telomere-length dependent manner, but using ALT negative cell lines with naturally short telomere length vs those with long telomeres will give better perspective. Primary cells can also be used in this context.

      The reviewer correctly highlights (as we also acknowledge in the Discussion) that our study primarily utilizes engineered cell lines with artificially elongated or shortened telomeres. We agree that using ALT-negative cells with naturally short versus long telomeres would provide additional perspective in testing our hypothesis. However, a key challenge in this experimental setup is the inherent variation in TRF2 protein levels among these cell types—a parameter central to our hypothesis. Comparing observations across such non-isogenic cell line pairs would require extensive normalization for multiple factors and could introduce additional complexities, potentially raising more questions among scientific readers.

      We had also explored primary cells, specifically foreskin fibroblasts and MRC5 lung fibroblasts, as suggested by the reviewer. However, we encountered two significant challenges. To achieve a notable telomere length difference of at least 20%, these primary cells had to undergo a minimum of 25 passages. During this period, we observed a substantial decline in their proliferation capacity and an increased tendency toward replicative senescence. Additionally, we noted a significant reduction in TRF2 protein levels as the primary cells aged, consistent with findings from Fujita K et al., 2010 (Nat Cell Biol.), which reported p53-induced, Siah-1-mediated proteasomal degradation of TRF2. Due to these practical limitations, we focused on cancerous cell lines with an isogenic background, ensuring a controlled experimental framework. This, in turn, opens new avenues for future research to explore broader implications. Investigating other primary cell types that may not present these challenges could be a valuable direction for future studies.

      (3) The authors set up time-dependent telomere length changes by dox induction, which may differ from the gradual telomere attrition or elongation that occurs naturally during aging, disease progression, or therapy. This aspect should be explored.

      In this study, we utilized a Doxycycline-inducible hTERT expression system to modulate telomere length in cancer cells, aiming to capture any gradual changes that might occur upon steady telomerase induction or overexpression—an event frequently observed in cancer progression. We monitored telomere length and telomerase activity at regular intervals (Supplementary Figure 2), noting a gradual increase until a characteristic threshold was reached, followed by a reversal to the initial telomere length.

      While this model provides interesting insights in context of cancer cells, it does not replicate the conditions of aging or therapeutic intervention. We agree that exploring telomere length-dependent regulation of hTERT in normal aging cells is an important avenue for future research. Investigating TRF2 occupancy on the hTERT promoter in response to telomere length alterations through therapeutic interventions—such as telomestatin or imetelstat (telomerase inhibitors) and 6-thio-2’-deoxyguanosine (telomere damage inducer)—would provide valuable insights and warrants further exploration.

      (4) How does the hTERT regulation by TRF2 in a TL-dependent manner affect the ETS binding on hTERT mutant promoter sites?

      In our previous study (Sharma et al., 2021, Cell Reports), we have experimentally demonstrated that GABPA and TRF2 do not compete for binding at the mutant hTERT promoter (Figure 4M-R). Silencing GABPA in various mutant hTERT promoter cells did not increase TRF2 binding. While GABPA has been reported to show increased binding at the mutant promoter compared to the wild-type (Bell et al., 2015, Science), no telomere length (TL) sensitivity has been noted yet. This manuscript shows that telomere alterations in hTERT mutant cells do not significantly increase TRF2 occupancy at the promoter, reinforcing our earlier findings that G-quadruplex formation is crucial for TRF2 recruitment. Since TRF2 binding does not increase significantly at the mutant promoter and does not compete with GABPA, TL-sensitive TRF2 binding is unlikely to directly influence ETS binding by GABPA. Hence, increased GABPA binding to the mutant promoter as reported in the literature, remains independent of TL-sensitive TRF2 binding. However, an experimental demonstration of the above observation-based speculation would be ideal to answer the query in the future.

      (5) Stabilization of the G-quadruplex structures in ST and LT conditions along with the G4 disruption experimentation (demonstrated by the authors) will strengthen the hypothesis.

      We agree with the reviewer’s suggestion that stabilizing G-quadruplex (G4) structures in mutant promoter cells under ST and LT conditions would further strengthen our hypothesis. From our ChIP experiments on hTERT promoter mutant cells following G4 stabilization with ligands, as reported in Sharma et al. 2021 (Figure 5G), we observed that TRF2 occupancy was regained in the telomere-length unaltered versions of -124G>A and -146G>A HEK293T Gaussia luciferase cells (referred to as LT cells in the current manuscript).

      Based on these published findings, we anticipate a similar restoration of TRF2 binding in the short telomere (ST) versions, given the increased availability of TRF2 protein molecules, as proposed in our Telomere Sequestration Partitioning model.

      (6) The telomere length and the telomerase activity are not very consistent (Figure 2A, and S1A, Figure 4B and S3). Please comment.

      In this study, we employed both telomerase-dependent and independent methods for telomere elongation.

      HT1080 model: Telomere elongation resulted from constitutive overexpression of hTERC and hTERT, leading to a direct correlation with telomerase activity.

      HCT116 (p53-null) model: hTERC silencing in ST cells, a known limiting factor for telomerase activity, resulted in significantly lower telomerase activity and a 1.5-fold telomere length difference.

      MDAMB231 model: Guanine-rich telomeric repeat (GTR) feeding induced telomere elongation through recombinatorial mechanisms (Wright et al., 1996), leading to significant telomere length gain but no notable change in telomerase activity.

      HCT116 Cas9-telomeric sgRNA model: Telomere shortening occurred without modifying telomerase components, resulting in a minor, insignificant increase in telomerase activity (Figure 2A, S1).

      Regarding xenograft-derived HT1080 ST and LT cells (Figure 4B, S3), the observed variability in telomere length and telomerase activity may stem from infiltrating mouse cells, which naturally have longer telomeres and higher telomerase activity than human cells. Since in the reported assay tumour masses were not sorted to exclude mouse cells, using species-specific markers or fluorescently labelled HT1080 cells in future experiments would minimize bias. However, even though telomere length and telomerase activity assays cannot differentiate for cross-species differences, mRNA analysis and ChIP experiments performed specifically for hTERT and hTERC mRNA levels, TRF2 occupancy, and H3K27me3 enrichment on hTERT promoter (Figure 4B–E) strongly support our conclusions.

      (7) Please comment on the other telomere-associated proteins or regulatory pathways that might contribute to hTERT expression based on telomere length.

      The current study provides experimental evidence that TRF2, a well-characterized telomere-binding protein, mediates crosstalk between telomeres and the regulatory region of the hTERT gene in a telomere length-dependent manner. Given the observed link between hTERT expression and telomere length, it is likely that additional telomere-associated proteins and regulatory pathways contribute to this regulation.

      The remaining shelterin complex components—POT1, hRap1, TRF1, TIN2, and TPP1—may play crucial roles in this context, as they are integral to telomere maintenance and protection. Additionally, several DNA damage response (DDR) proteins, which interact with telomere-binding factors and help preserve telomere integrity, could potentially influence hTERT regulation in a telomere length-dependent manner. However, direct interactions or regulatory roles would require further experimental validation. Another group of proteins with potential relevance in this mechanism are the sirtuins, which directly associate with telomeres and are known to positively regulate telomere length, undergoing repression upon telomere shortening. Notably, SIRT1 has been reported to interact with telomerase (Lee SE et al., 2024, Biochem Biophys Res Commun.), while SIRT6 has been implicated in TRF2 degradation and telomerase activation. Given their roles in telomere homeostasis, sirtuins may serve as key mediators of telomere length-dependent hTERT regulation.

      Beyond protein-mediated mechanisms like the Telomere Sequestration partitioning model, telomere length-dependent regulation of hTERT may also involve chromatin architecture. The Telomere Position Effect—Over Long Distances (TPE-OLD), a phenomenon whereby telomere conformation influences gene expression at distant loci, has been reviewed extensively (Kim et al., 2018, Differentiation).

      Reviewer #2 (Public review):

      Summary:

      Telomeres are key genomic structures linked to everything from aging to cancer. These key structures at the end of chromosomes protect them from degradation during replication and rely on a complex made up of human telomerase RNA gene (hTERC) and human telomerase reverse transcriptase (hTERT). While hTERC is expressed in all cells, the amount of hTERT is tightly controlled. The main hypothesis being tested is whether telomere length itself could regulate the hTERT enzyme. The authors conducted several experiments with different methods to alter telomere length and measured the binding of key regulatory proteins to this gene. It was generally observed that the shortening of telomere length leads to the recruitment of factors that reduce hTERT expression and lengthening of telomeres has the opposite effect. To rule out direct chromatin looping between telomeres and hTERT as driving this effect artificial constructs were designed and inserted a significant distance away and similar results were obtained.

      Overall, the claims of telomere length-dependent regulation of hTERT are supported throughout the manuscript.

      Strengths:

      The paper has several important strengths. Firstly, it uses several methods and cell lines that consistently demonstrate the same directionality of the findings. Secondly, it builds on established findings in the field but still demonstrates how this mechanism is separate from that which has been observed. Specifically, designing and implementing luciferase assays in the CCR5 locus supports that direct chromatin looping isn't necessary to drive this effect with TRF2 binding. Another strength of this paper is that it has been built on a variety of other studies that have established principles such as G4-DNA in the hTERT locus and TRF2 binding to these G4 sites.

      Weaknesses:

      The largest technical weakness of the paper is that minimal replicates are used for each experiment. I understand that these kinds of experiments are quite costly, and many of the effects are quite large, however, experiments such as the flow cytometry or the IPSC telomere length and activity assays appear to be based on a single sample, and several are based upon two maximum three biological replicates. If samples were added the main effects would likely hold, and many of the assays using GAPDH as a control would result in significant differences between the groups. This unnecessarily weakens the strength of the claims.

      We appreciate the reviewer’s recognition of the resource-intensive nature of our experiments, and we are confident in the robustness of the observed results. Due to the project’s timeline constraints and the need for consistency across experiments, we have reported findings based on 3 biological replicates with appropriate statistical analysis.

      Regarding the fibroblast-iPSC model, we would like to clarify that we have presented data from two independent biological replicates, each consisting of a fibroblast and its derived iPS cell pair, rather than a single sample. Additionally, the Tel-FACS assay involved analyzing at least 10,000 events, ensuring statistical significance in all cases. Alongside this, we also conducted qRT-PCR-based telomere length determination assays. While both assays were performed, we chose to report the more sensitive Tel-FACS data in the manuscript to provide a clearer representation of the results.

      Another detail that weakens the confidence in the claims is that throughout the manuscript there are several examples of the control group with zero variance between any of the samples: e.g. Figure 2K, Figure 3N, and Figure 6G. It is my understanding that a delta delta method has been used for calculation (though no exact formula is reported and would assist in understanding). If this is the case, then an average of the control group would be used to calculate that fold change and variance would exist in the group. The only way I could understand those control group samples always set to 1 is if a tube of cells was divided into conditions and therefore normalized to the control group in each case. A clearer description in the figure legend and methods would be required if this is what was done and repeated measures ANOVA and other statistics should accompany this.

      We thank the reviewer for their valuable feedback. In response to the comment about the control group and error calculation, we would like to clarify our approach. In our previous analysis, we set the control group (Day 0) as 1 to calculate the fold change and did not include error bars, as there was no variation in the control group (since all values were normalized to 1). However, as per the reviewer’s suggestion, we will now include error bars on the Day 0 control group. These error bars will be calculated based on the standard deviation (SD) of the Ct values across the biological replicates for the control group. For the Day 10 and Day 24 time points, we retain the error bars that reflect the variance in fold change across replicates, as originally reported.

      This adjustment would allow for a clearer representation of the data and variance in the control group. We believe this addresses the reviewer’s concerns about the error calculation, and we shall update the figure legend and methods to reflect these changes. Statistical analysis, including ANOVA, was already applied as indicated in the figure.

      A final technical weakness of the paper is the data in Figure 5 where the modified hTERT promoter was inserted upstream of the luciferase gene. Specifically, it is unclear why data was not directly compared between the constructs that could and could not form G4s to make this point. For this reason, the large variance in several samples, and minimal biological replicates, this data was the least convincing in the manuscript (though other papers from this laboratory and others support the claim, it is not convincing standalone data).

      We appreciate the reviewer's thoughtful feedback on the presentation of the luciferase assay data in Figure 5. The data for the wild-type hTERT promoter (capable of forming G4 structures) was previously reported in Figure 2G-K. To avoid redundancy in data presentation, we initially chose to report the results of the mutated promoter separately. However, we recognize that directly comparing the wild-type and mutated promoter constructs within the same figure would provide clearer context and strengthen the interpretation of the results. In light of this, we will revise Figure 5 in the updated manuscript to include the data for both constructs, ensuring a more comprehensive and informative comparison.

      The second largest weakness of the paper is formatting.

      When I initially read the paper without a careful reading of the methods, I thought that the authors did not have appropriate controls meaning that if a method is applied to lengthen, there should be one that is not lengthened, and when a method is applied to shorten, one which is not shortened should be analysed as well. In fact, this is what the authors have done with isogenic controls. However, by describing all samples as either telomere short or telomere long, while this simplifies the writing and the colour scheme, it makes it less clear that each experiment is performed relative to an unmodified. I would suggest putting the isogenic control in one colour, the artificially shortened in another, and the artificially lengthened in another.

      Similarly, the graphs, in general, should be consistent with labelling. Figure 2 was the most confusing. I would suggest one dotted line with cell lines above it, and then the method of either elongation or shortening below it. I.e. HT1080 above, hTERC overexpression below, MDAMB-231 above guanine terminal repeats below, like was done on the right. Figure 2 readability would also be improved by putting hTERT promoter GAPDH (-ve control) under each graph that uses this (Panel B and Panel C not just Panel C). All information is contained in the manuscript but one must currently flip between figure legends, methods, and figures to understand what was done and this reduces clarity for the reader.

      We sincerely thank the reviewer for their constructive feedback on the formatting and clarity of the figures. We appreciate the time and effort taken to suggest ways to enhance the visual presentation and readability of the manuscript. We agree that clearer differentiation of the experimental groups would help avoid confusion, and we will consider ways to improve the visual organization, as much as possible. Additionally, we will work on restructuring the graphs for greater consistency in labeling and alignment, especially in Figure 2, to improve readability and reduce the need for cross-referencing between the figures, figure legends, and methods section. We will also ensure the hTERT promoter GAPDH (-ve control) label appears under all relevant graphs for consistency. We will make revisions to the figures in line with these suggestions to improve the overall clarity and flow of the manuscript, as much as possible.

    1. eLife Assessment

      This study provides an important method to model the statistical biases of hypermutations during the affinity maturation of antibodies. The authors show convincingly that their model outperforms previous methods with fewer parameters; this is made possible by the use of machine learning to expand the context dependence of the mutation bias. They also show that models learned from nonsynonymous mutations and from out-of-frame sequences are different, prompting new questions about germinal center function. Strengths of the study include an open-access tool for using the model, a careful curation of existing datasets, and a rigorous benchmark; it is also shown that current machine-learning methods are currently limited by the availability of data, which explains the only modest gain in model performance afforded by modern machine learning.

    2. Reviewer #1 (Public review):

      Summary:

      This paper introduces a new class of machine learning models for capturing how likely a specific nucleotide in a rearranged IG gene is to undergo somatic hypermutation. These models modestly outperform existing state-of-the-art efforts, despite having fewer free parameters. A surprising finding is that models trained on all mutations from non-functional rearrangements give divergent results from those trained on only silent mutations from functional rearrangements.

      Strengths:

      (1) The new model structure is quite clever and will provide a powerful way to explore larger models.

      (2) Careful attention is paid to curating and processing large existing data sets.

      (3) The authors are to be commended for their efforts to communicate with the developers of previous models and use the strongest possible versions of those in their current evaluation.

      Weaknesses:

      (1) 10x/single cell data has a fairly different error profile compared to bulk data. A synonymous model should be built from the same `briney` dataset as the base model to validate the difference between the two types of training data.

      (3) The decision to test only kernels of 7, 9, and 11 is not described. The selection/optimization of embedding size is not explained. The filters listed in Table 1 are not defined.

    3. Reviewer #2 (Public review):

      This work offers an insightful contribution for researchers in computational biology, immunology, and machine learning. By employing a 3-mer embedding and CNN architecture, the authors demonstrate that it is possible to extend sequence context without exponentially increasing the model's complexity.

      Key findings include:

      (1) Efficiency and Performance: Thrifty CNNs outperform traditional 5-mer models and match the performance of significantly larger models like DeepSHM.

      (2) Neutral Mutation Data: A distinction is made between using synonymous mutations and out-of-frame sequences for model training, with evidence suggesting these methods capture different aspects of SHM, or different biases in the type of data.

      (3) Open Source Contributions: The release of a Python package and pre-trained models adds practical value for the community.

      However, readers should be aware of the limitations. The improvements over existing models are modest, and the work is constrained by the availability of high-quality out-of-frame sequence data. The study also highlights that more complex modeling techniques, like transformers, did not enhance predictive performance, which underscores the role of data availability in such studies.

    4. Reviewer #3 (Public review):

      Summary:

      Modeling and estimating sequence context biases during B cell somatic hypermutation is important for accurately modeling B cell evolution to better understand responses to infection and vaccination. Sung et al. introduce new statistical models that capture a wider sequence context of somatic hypermutation with a comparatively small number of additional parameters. They demonstrate their model's performance with rigorous testing across multiple subjects and datasets. Prior work has captured the mutation biases of fixed 3-, 5-, and 7-mers, but each of these expansions has significantly more parameters. The authors developed a machine-learning-based approach to learn these biases using wider contexts with comparatively few parameters.

      Strengths:

      Well-motivated and defined problem. Clever solution to expand nucleotide context. Complete separation of training and test data by using different subjects for training vs testing. Release of open-source tools and scripts for reproducibility.

      Weaknesses:

      This study could be improved with better descriptions of dataset sequencing technology, sequencing depth, etc but this is a minor weakness.

    1. eLife Assessment

      Using an unbiased approach, this important study discovered a role of Ezh2 in the differentiation of granule neuron precursors, the cell of origin for Shh group of medulloblastoma. Furthermore, the authors also provided solid evidence that combined inhibition of Ezh2 and CDK4/6 likely represents a promising strategy for the treatment of this subgroup of MB. Validation of these findings using the FDA-approved Ezh2 inhibitor is needed to further strengthen this preclinical study.

    2. Reviewer #1 (Public review):

      In this manuscript, Purzner and colleagues examine the role of Ezh2 in cerebellar development and tumorigenesis using animal models of SHH medulloblastoma (MB). While Ezh2 plays a relatively minor role in granule neuron development and SHH MB, the authors demonstrate that Ezh2 inhibition, when combined with enforced cell cycle exit, promotes MB cell differentiation and potentially reduces malignancy. Overall, this study is solid and provides valuable insights into Ezh2 regulation in cerebellar development and SHH-MB tumorigenesis.

      Strengths:

      The authors investigate the role of Ezh2 in granule neuronal differentiation during cerebellar development and medulloblastoma (MB) progression, integrating multi-omics for a comprehensive epigenetic analysis. The use of Ezh2 conditional knockout (cKO) mice and combination therapy with Ezh2 and CDK4/6 inhibitors shows a promising strategy to induce terminal differentiation in MB cells, with potential therapeutic implications. Additionally, analysis of human SHH-MB samples reveals that higher EZH2 expression correlates with worse survival, indicating the clinical relevance.

      Weaknesses:

      The study does not fully explore compensatory mechanisms of PRC2 given that the phenotype of Ezh2 conditional knockout (cKO) in GNP development and MB tumor formation is relatively mild.

    3. Reviewer #2 (Public review):

      Summary:

      This study used an unbiased approach to evaluate epigenetic dynamics during the differentiation of granule neuron precursors, the cell of origin for Shh-MB. These profiling findings led to the focus on H3K27me3 dynamics, which correlate with the remodeling of epigenetic landscape associated with neuronal differentiation gene activation.

      Strengths:

      Depletion of EZH2, an enzymatic subunit of PRC2, resulted in premature neuronal differentiation in the developing cerebellum.

      Weaknesses:

      Little information is shown about the specific genetic programs disrupted by EZH2 depletion. This is a crucial weakness as existing PRC2 inhibitors do not effectively cross the blood-brain barrier. Further studies are necessary to identify downstream targets of PRC2 that could be targeted to induce neuronal differentiation in MB cells.

    1. eLife Assessment

      This serostudy of blood donors in Bolivia (a country with very high COVID death rates in 2020-21) provides useful insights on the successive viral variants of SARS-CoV-2 over 2021 and 2022. Using compelling antibody and neutralization assays, the authors describe variant specific distributions in the different parts of Bolivia. The main methodological advance is to use serology to understand variant diversity, which in turn helps deepen understanding of "hybrid" immunity from widespread infection (and vaccination).

    2. Reviewer #1 (Public review):

      Summary:

      This study provides valuable and comprehensive information about the SARS-CoV-2 seroprevalence during 2021 and 2022 in different regions of Bolivia. Moreover, data on immune responses against the SARS-CoV-2 variants based on neutralization tests denotes the presence of several virus variants circulating in the Bolivian population. Evidence for seroprevalence data provided by the authors is solid, across the study period, while data regarding variant circulation is limited to the early stages of the pandemic.

      Strengths:

      The major strength of this study is that it provided nationwide seroprevalence estimates from infection and/or vaccination based on antibodies against both spike and the nucleocapsid protein in a large representative sample of sera collected at two time points from all departments of Bolivia, gaining insight into COVID-19 epidemiology. On the other hand, data from virus neutralization assays inferred the circulation during the study period of four SARS-CoV-2 variants in the population. Overall, the study results provide an overview of the level of viral transmission and vaccination and insights into the spread across the country of SARS-CoV-2 variants.

      Weaknesses:

      The assessment of a Lambda variant that circulated in several neighboring countries (Peru, Chile, and Argentina), which had a significant impact on the COVID-19 pandemic in the region, may have strengthened the study to contrast Gamma spread. In addition, even though neutralizing antibodies can certainly reveal previous infections of SARSCOV2 variants in the population, it is of limited value to infer from this information some potential timing estimates of specific variant circulation, considering the heterogeneous effects that past infections, vaccinations, or a combination of both could have on the level of variant-specific neutralizing antibodies and/or their cross-neutralization capacity.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      The conclusions of this paper are well supported by data, particularly regarding seroprevalence that reliably reflects the epidemiology of COVID-19 in Bolivia, and seroprevalence trends in other low- and middle-income countries.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      Since this is the first study that has been conducted to assess indicators of immunity against SARS-CoV-2 in the population of Bolivia at a nationwide scale, seroprevalence data provided by geographic regions at two time points can be useful as a reference for potential retrospective global meta-analysis and to further explore and compare the risk factors for infection, variant distribution, and the impact on infection and vaccination, gaining deeper insights into understanding the evolution of the COVID-19 pandemic in Bolivia and in the region.

    3. Reviewer #3 (Public review):

      Summary:

      This study attempts to reconstruct the history of the COVID-19 epidemic, with its successive waves of viral variants from SARS-CoV-2 seroprevalence during 2021 and 2022 among blood donors in different regions of Bolivia. By using serological tests "specific" for the various variants the authors try to achieve a "colour" vision that is not provided by standard "black-and-white" serology.

      Strengths and Weaknesses:<br /> I am not an expert on the performance of SARS-CoV-2 serological tests, so may overlook certain weaknesses. Instead I tried to assess whether the authors, in this manuscript, have managed to substantiate their claims that "seroprevalence studies are a valuable adjunct to active surveillance because they allow analysis of the level of immunity of a population to a specific pathogen without the need for prospective testing" , and that "genomic surveillance and serology offer distinct yet complementary insights thus far." I think they succeeded, as they paint a credible and interesting history of the epidemic in Bolivia using (to me) novel methodology that certainly will stimulate extensive discussion, controversies, and follow-up studies (for which the authors might make some suggestions).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study provides valuable and comprehensive information about the SARS-CoV-2 seroprevalence during 2021 and 2022 in different regions of Bolivia. Moreover, data on immune responses against the SARS-CoV-2 variants based on neutralization tests denotes the presence of several virus variants circulating in the Bolivian population. Evidence for seroprevalence data provided by the authors is solid, across the study period, while data regarding variant circulation is limited to the early stages of the pandemic.

      Strengths:

      The major strength of this study is that it provided nationwide seroprevalence estimates from infection and/or vaccination based on antibodies against both spike and the nucleocapsid protein in a large representative sample of sera collected at two time-points from all departments of Bolivia, gaining insight into COVID-19 epidemiology. On the other hand, data from virus neutralization assays inferred the circulation during the study period of four SARS-CoV-2 variants in the population. Overall, the study results provide an overview of the level of viral transmission and vaccination and insights into the spread across the country of SARS-CoV-2 variants.

      Weaknesses:

      The assessment of a Lambda variant that circulated in several neighboring countries (Peru, Chile, and Argentina), which had a significant impact on the COVID-19 pandemic in the region, may have strengthened the study to contrast Gamma spread. In addition, even though neutralizing antibodies can certainly reveal previous infections of SARSCOV2 variants in the population, it is of limited value to infer from this information some potential timing estimates of specific variant circulation, considering the heterogeneous effects that past infections, vaccinations, or a combination of both could have on the level of variant-specific neutralizing antibodies and/or their cross-neutralization capacity.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The conclusions of this paper are well supported by data, particularly regarding seroprevalence that reliably reflects the epidemiology of COVID-19 in Bolivia, and seroprevalence trends in other low- and middle-income countries.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:

      Since this is the first study that has been conducted to assess indicators of immunity against SARSCoV-2 in the population of Bolivia at a nationwide scale, seroprevalence data provided by geographic regions at two time-points can be useful as a reference for potential retrospective global metaanalysis and further explore and compare the risk factors for infection, variant distribution, and the impact on infection and vaccination, gaining deeper insights into understanding the evolution of the COVID-19 pandemic in Bolivia and in the region.

      Reviewer #2 (Public Review):

      Significance of the findings:

      In this study, blood donors were assessed using serology and viral neutralization assays to determine the prevalence of SARS-CoV-2 antibodies. S1 and NCP antibodies were used to distinguish between vaccination and natural infection and virus-specific neut titers were used to determine which variants the antibodies respond to. The study reports almost universal antibody prevalence and increases in antibodies against specific variants at different points corresponding to circulating variants identified phylogenetically in neighbouring countries. The authors propose this approach for settings like Bolivia where genetic sequencing is not readily available. Unfortunately, there are significant limitations to this approach that limit its utility - serological data are available after the fact in a fast-moving pandemic and so are a poor alternative to phylogenetic data. Rather, serological information can supplement phylogenetic data and is most useful in estimating population-level immunity.

      (1) Considerations in interpreting the results:

      We appreciate the reviewer's valuable feedback, which will certainly enhance the quality of our manuscript. As a result, we have revised the text to address their suggestions as thoroughly as possible.

      a. Serology provides different information to phylogenetic sequencing of the viruses and so both are important. Viral sequencing provides real-time information on circulating variants and indicates the proportion of each variant in circulation at any point as there are almost always multiple variants spreading but it is the fastest spreading variant that comes to dominate. Importantly serology measures asymptomatic infections as well, providing population estimates of infection that are not available through viral gene sequencing.

      We underscored this point in the introduction by incorporating the following sentences:

      “Seroprevalence studies are a valuable adjunct to active surveillance because they allow analysis of the level of immunity of a population to a specific pathogen without the need for prospective testing, and also provide information on the frequency of cases that do not attract medical attention (asymptomatic infections)(4).” and “To date, the circulation of SARS-CoV-2 variants has mainly been studied through molecular surveillance, giving the proportion of circulating variants in real time. Therefore, genomic surveillance and serology offer distinct yet complementary insights thus far.”

      b. A major concern in the interpretation of serology is that antibody titers vary markedly over time with rapid declines in the first year post-infection or post-vaccination. However, these declines vary depending on whether hybrid immunity is present. Disentangling this retrospectively is a challenge. A low antibody titer could reflect an infection that occurred a few months ago but may be below the threshold for positivity at the time of testing. There is also substantial individual variability in antibody responses.

      This limitation merits emphasis and has consequently been elaborated upon in the discussion section:

      “Secondly, our results are based on serological data and may not be strictly identical to the genomic data from a quantitative point of view, although they are likely to reflect similar trends and distributions (see below). The results could also be influenced by various factors, including significant individual variation in antibody responses, as well as the decline in antibody titers during the first months following infection or vaccination(31-34) and could therefore sligly underestimated. As the complexity of SARS-CoV-2 antigen exposure histories increased among tested individuals, we observed a tendency for serological data to start diverging from genomic data. This suggests, as expected, that the effectiveness of this method would be greater if implemented early in an epidemic when the occurrence of multiple infections with different variants or the administration of varying doses of vaccine in the analyzed population before or after infection (resulting in hybrid immunity) is still limited. However, to mitigate the potential challenges arising from complex antigen exposure, we employed straightforward criteria to identify the variant among the four tested in VNT that exhibited the highest value (cf methods), thereby likely indicating the main or most recent infection and minimizing the influence of crossneutralization on the final outcomes. In addition, several approaches were used to analyze the results, including quantification of circulating antigenic groups and individual variants, yielding results that were comparable and closely aligned with the genomic data.”

      c. Serology becomes increasingly difficult to untangle when an individual has had doses of vaccine and multiple natural infections with different variants. Due to the importance of hybrid immunity in population risk to new variants, it would be useful for estimates of hybrid immunity to be generated based on anti-S1 and anti-NCP antibodies. From a population immunity perspective, this could be important in guiding future protection and boosting strategies.

      We estimated the hybrid immunity for each department in 2021 and 2022 based on the prevalence of anti-S1 and anti-NCP antibodies and added a new Supplementary Table 1. We also added a description of this table in the result section: “The estimated hybrid immunity, based on the prevalence of anti-S1 and anti-NCP antibodies, ranged from 51.4% in Pando to 73.6% in Potosí in 2021. By 2022, this increased to between 83.3% in Santa Cruz and 90.6% in Tarija (Supplementary Table 1).”

      d. Since there is cross-neutralization by the antibodies stimulated by each variant, it is important to establish the sensitivity and specificity of each of the neutralization assays in a panel comprising multiple variants. An assessment of the accuracy of the neut assay for each variant is needed to be confident that it is able to distinguish between variants.

      Assessing the performance of a the VNT for each SARS-CoV-2 variants is a highly complex task. This evaluation requires samples with comprehensive data on vaccination and infection specific to each variant to determine the specificity of each VNT for each variant. However, the access to such samples for every newly emerging variant remains challenging. In order to circumvent this issue, we evaluated the circulation level of γ, δ, and ο variants under increasingly stringent conditions, by calculating the proportion of the population with log2-ratio values of ≤0 (variant titer equal to or greater than D614G), ≤-1 (variant titer at least twice that of D614G), and ≤-2 (variant titer at least four times that of D614G).

      e. Blood donors are notoriously poor representations of the general population in many countries, driven partly by whether donation is financially rewarded. For example, in the USA, drug addicts are disproportionately over-represented in blood donor populations as they use it as a source of money. The authors provide no information on whether the blood donor population in Bolivia is representative of the entire population. Comparison of the prevalence of specific disease markers in the general population and in blood donors could provide a signal of their comparability.

      This is a significant aspect addressed in point 3.

      (2) Please provide the sensitivity and specificity of each of the assays so that the reader can assess the degree of accuracy in the assay that claims that the prevalent antibodies are due to, for example, omicron.

      The sensitivity and specificity of the in vitro assays are now referenced in a previous study: “The sensitivity and specificity of the in vitro assays were described previously(23).”

      Neutralization assays are considered the gold standard for measuring neutralizing antibodies against SARS-CoV-2 and its variants, and they are widely used in seroprevalence studies. However, until now, no one has successfully evaluated the specificity and sensitivity of this assay for SARS-CoV-2 variants, as it requires sera from individuals exposed to a single variant, which are increasingly difficult to collect for each newly emerging variants. Nevertheless, using sera from laboratory-infected animals (primarily hamsters) with a single variant exposure has enabled the antigenic characterization of SARS-CoV-2 variants through viral neutralization. This approach has shown that it is possible to distinguish between sera from individuals infected with different variants, even among the Omicron subvariants (Anna Z. Mykytyn et al. Antigenic cartography of SARS-CoV-2 reveals that Omicron BA.1 and BA.2 are antigenically distinct.Sci. Immunol.7,eabq4450(2022); Samuel H. Wilks et al. Mapping SARS-CoV-2 antigenic relationships and serological responses.Science382,eadj0070(2023)).

      (3) Please provide an assessment of the representativity of the blood donor population eg. Is the prevalence of hepatitis B serological markers in the blood donor population comparable with the prevalence of hepatitis B serological markers in the general population from community-based studies?

      A new sentence was included in the discussion to offer support for considering the blood donor population as a representative sample of the general population: “In addition, in Bolivia, blood donation is unrewarded, and blood donors appear to be quite representative of the general population. Indeed, routine screening for several infection markers (such as HIV or HBV) is conducted in all donors, and the prevalences of these markers do not differ from those observed in the general population. For example, UNAIDS data highlights a 0.4% HIV prevalence within the Bolivian general population, with significantly higher rates exceeding 25% observed in high-risk groups such as men who have sex with men(29). Moreover, Sheena et al. estimated a 0.6% prevalence of HBsAg in Bolivia in 2019(30). Bolivian national statistics of National Blood Program of the Ministry of Health and Sports, indicate that between 2019 and 2023, the proportion of HIV- and HBV-reactive units among screened blood donors ranged from 0.26% to 0.41% and 0.16% to 0.25%, respectively (Dr. Lissete Bautista’s personal communication).”

    1. eLife Assessment

      This study presents a valuable finding on the role of secretory leukocyte protease inhibitors (SLPI) in developing Lyme disease in mice infected with Borrelia burgdorferi. The evidence supporting the claims of the authors is solid. This paper will be of interest to scientists in the infectious inflammatory disease field.

    2. Reviewer #1 (Public review):

      Summary:

      This study demonstrates the significant role of secretory leukocyte protease inhibitor (SLPI) in regulating B. burgdorferi-induced periarticular inflammation in mice. They found that SLPI-deficient mice showed significantly higher B. burgdorferi infection burden in ankle joints compared to wild-type controls. This increased infection was accompanied by infiltration of neutrophils and macrophages in periarticular tissues, suggesting SLPI's role in immune regulation. The authors strengthened their findings by demonstrating a direct interaction between SLPI and B. burgdorferi through BASEHIT library screening and FACS analysis. Further investigation of SLPI as a target could lead to valuable clinical applications.

      The conclusions of this paper are mostly well supported by data. And the authors were responsive to the reviewers' comments.

      Comments on revised version:

      The authors have thoroughly addressed the previous concerns and improved the manuscript. The revisions have strengthened both the conclusions. I have no additional suggestions for improvement and recommend this manuscript for publication.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Yu and coworkers investigates the potential role of Secretory leukocyte protease inhibitor (SLPI) in Lyme arthritis. They show that, after needle inoculation of the Lyme disease (LD) agent, B. burgdorferi, compared to wild type mice, a SLPI-deficient mouse suffers elevated bacterial burden, joint swelling and inflammation, pro-inflammatory cytokines in the joint, and levels of serum neutrophil elastase (NE). They suggest that SLPI levels of Lyme disease patients are diminished relative to healthy controls. Finally, they find that SLPI may interact directly the B. burgdorferi.

      Strengths:

      Many of these observations are interesting and the use of SLPI-deficient mice is useful (and has not previously been done).

      Weaknesses:

      (a) The known role of SLPI in dampening inflammation and inflammatory damage by inhibition of NE makes the enhanced inflammation in the joint of B. burgdorferi-infected mice a predicted result; (b) The potential contribution of the greater bacterial burden to the enhanced inflammation is acknowledged but not experimentally addressed; (c) The relationship of SLPI binding by B. burgdorferi to the enhanced disease of SLPI-deficient mice is not addressed in this study, making the inclusion of this observation in this manuscript incomplete; and (d) assessment of SLPI levels in healthy controls vs. Lyme disease patients is inadequate.

      Comments on revised verson:

      Several of the points were addressed in the revised manuscript, but the following issues remain:

      Previous point that the relationship of SLPI binding to B. burgdorferi to the enhanced disease of SLPI-deficient mice is not investigated: The authors indicate that such investigations are ongoing. In the absence of any findings, I recommend that their interesting BASEHIT and subsequent studies be presented in a future study, which would have high impact.

      Previous recommendation 1: (The authors added lines 267-68, not 287-68). This ambiguity is acknowledged but remains. In addition, in the revised manuscript, the authors state "However, these data also emphasize the importance of SLPI in controlling the development of inflammation in periarticular tissues of B. burgdorferi-infected mice." Given acknowledged limitations of interpretation, "suggest" would be more appropriate than "emphasize".

      Previous recommendation 5: The lack of clinical samples can be a challenge. Nevertheless, 4 of the 7 samples from LD patients are from individuals suffering from EM rather than arthritis (i.e., the manifestation that is the topic of the study) and some who are sampled multiple times, make an objective statistical comparison difficult. I don't have a suggestion as to how to address the difference in number of samples from a given subject. However, the authors could consider segregating EM vs. LA in their analysis (although it appears that limiting the comparison between HC and LA patients would not reveal a statistical difference).

      Previous recommendation 6: Given that binding of SLPI to the bacterial surface is an essential aspect of the authors' model, and that the ELISA assay to indicate SLPI binding used cell lysates rather than intact bacteria, a control PI staining to validate the integrity of bacteria seems reasonable.

      Previous recommendation 8: The inclusion of a no serum control (that presumably shows 100% viability) would validate the authors' assertion that 20% serum has bactericidal activity.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigated the role of secretory leukocyte protease inhibitors (SLPI) in developing Lyme disease in mice infected with Borrelia burgdorferi. Using a combination of histological, gene expression, and flow cytometry analyses, they demonstrated significantly higher bacterial burden and elevated neutrophil and macrophage infiltration in SLPI-deficient mouse ankle joints. Furthermore, they also showed direct interaction of SLPI with B. burgdorferi, which likely depletes the local environment of SLPI and causes excessive protease activity. These results overall suggest ankle tissue inflammation in B. burgdorferi-infected mice is driven by unchecked protease activity.

      Strengths:

      Utilizing a comprehensive suite of techniques, this is the first study showing the importance of anti-protease-protease balance in the development of periarticular joint inflammation in Lyme disease.

      Weaknesses:

      Due to the limited sample availability, the authors investigated the serum level of SLPI in both Lyme arthritis patients and patients with earlier disease manifestations. This limitation is thoroughly discussed in the manuscript.

      Comments on revised version:

      I thank the authors for considering my comments carefully.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates the significant role of secretory leukocyte protease inhibitor (SLPI) in regulating B. burgdorferi-induced periarticular inflammation in mice. They found that SLPI-deficient mice showed significantly higher B. burgdorferi infection burden in ankle joints compared to wild-type controls. This increased infection was accompanied by infiltration of neutrophils and macrophages in periarticular tissues, suggesting SLPI's role in immune regulation. The authors strengthened their findings by demonstrating a direct interaction between SLPI and B. burgdorferi through BASEHIT library screening and FACS analysis. Further investigation of SLPI as a target could lead to valuable clinical applications.

      The conclusions of this paper are mostly well supported by data, but two aspects need attention:

      (1) Cytokine Analysis:

      The serum cytokine/chemokine profile analysis appears without TNF-alpha data. Given TNF-alpha's established role in inflammatory responses, comparing its levels between wild-type and infected B. burgdorferi conditions would provide valuable insight into the inflammatory mechanism.

      (2) Sample Size Concerns:

      While the authors note limitations in obtaining Lyme disease patient samples, the control group is notably smaller than the patient group. This imbalance should either be addressed by including additional healthy controls or explicitly justified in the methodology section.

      We thank the reviewer for the careful review and positive comments.

      (1) We did look into the level of TNF-alpha in both WT and SLPI-/- mice with and without B. burgdorferi infection. At serum level, using ELISA, we did not observe any significant difference between all four groups. At gene expression level, using RT-qPCR on the tibiotarsal tissue, we also did not observe any significant differences. Our RT-qPCR result is consistent with the previous microarray study using the whole murine joint tissue (DOI: 10.4049/jimmunol.177.11.7930). The microarray study did not show significant changes in TNF-alpha level in C57BL/6 mice following B. burgdorferi infection. A brief discussion has been added, and the above data is provided as Supplemental figure 4 in the revised manuscript, line 334-339, and 756-763.

      (2) We agree with the reviewer that the control group is smaller than the patient group. Among the archived samples that are available, the number of adult healthy controls are limited. It has been shown that the serum level of SLPI in healthy volunteers is in average about 40 ng/ml  (DOI: 10.3389/fimmu.2019.00664 and 10.1097/00003246-200005000-00003). The median level in the healthy control in our data was 38.92 ng/ml, which is comparable to the previous results. A brief discussion has been added in the revised manuscript, line 364-369.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Yu and coworkers investigates the potential role of Secretory leukocyte protease inhibitor (SLPI) in Lyme arthritis. They show that, after needle inoculation of the Lyme disease (LD) agent, B. burgdorferi, compared to wild type mice, a SLPI-deficient mouse suffers elevated bacterial burden, joint swelling and inflammation, pro-inflammatory cytokines in the joint, and levels of serum neutrophil elastase (NE). They suggest that SLPI levels of Lyme disease patients are diminished relative to healthy controls. Finally, they find that SLPI may interact directly the B. burgdorferi.

      Strengths:

      Many of these observations are interesting and the use of SLPI-deficient mice is useful (and has not previously been done).

      We appreciate the reviewer’s careful reading and positive comments.

      Weaknesses:

      (a) The known role of SLPI in dampening inflammation and inflammatory damage by inhibition of NE makes the enhanced inflammation in the joint of B. burgdorferi-infected mice a predicted result;

      We agree that the observation of the elevated NE level and the enhanced inflammation is theoretically likely. Indeed, that was the hypothesis that we explored, and often what is theoretically possible does not turn out to occur. In addition, despite the known contribution of neutrophils to the severity of murine Lyme arthritis, the importance of the neutrophil serine proteases and anti-protease has not been specifically studied, and neutrophils secrete many factors. Therefore, our data fill an important gap in the knowledge of murine Lyme arthritis development – and set the stage for the further exploration of this hypothesis in the genesis of human Lyme arthritis.

      (b) The potential contribution of the greater bacterial burden to the enhanced inflammation is not addressed;

      We agree with the reviewer’s viewpoint that the increased infection burden in the tibiotarsal tissue of the infected SLPI-/- mice could contribute to the enhanced inflammation. A brief discussion of this possibility has been added in the revised manuscript, line 287-288.

      (c) The relationship of SLPI binding by B. burgdorferi to the enhanced disease of SLPI-deficient mice is not clear; and

      We agree with the reviewer that we have not shown the importance of the SLPI-B. burgdorferi binding in the development of periarticular inflammation. It is an ongoing project in our lab to identify the SLPI binding partner in B. burgdorferi. Our hypothesis is that SLPI could bind and inhibit an unknown B. burgdorferi virulence factor that contributes to murine Lyme arthritis. A brief discussion has been added in the revised manuscript, line 401-407.

      (d) Several methodological aspects of the study are unclear.

      We appreciate the critique. We have modified the methods section in greater detail in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the role of secretory leukocyte protease inhibitors (SLPI) in developing Lyme disease in mice infected with Borrelia burgdorferi. Using a combination of histological, gene expression, and flow cytometry analyses, they demonstrated significantly higher bacterial burden and elevated neutrophil and macrophage infiltration in SLPI-deficient mouse ankle joints. Furthermore, they also showed direct interaction of SLPI with B. burgdorferi, which likely depletes the local environment of SLPI and causes excessive protease activity. These results overall suggest ankle tissue inflammation in B. burgdorferi-infected mice is driven by unchecked protease activity.

      Strengths:

      Utilizing a comprehensive suite of techniques, this is the first study showing the importance of anti-protease-protease balance in the development of periarticular joint inflammation in Lyme disease.

      We greatly appreciate the reviewer’s careful reading and positive comments.

      Weaknesses:

      Due to the limited sample availability, the authors investigated the serum level of SLPI in both in Lyme arthritis patients and patients with earlier disease manifestations.

      We agree with the reviewer that it would be ideal to have more samples from Lyme arthritis patients. However, among the available archived samples, samples from Lyme arthritis patients are limited. For the samples from patients with single EM, the symptom persisted into 3-4 month after diagnosis, the same timeframe when acute arthritis is developed. A brief discussion has been added in the revised manuscript, line 364-369.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 2, for histological scoring, do they have similar n numbers?

      In panel B, 20 infected WT mice and 19 infected SLPI-/- mice were examined. In panel D, 13 infected WT and SLPI-/- mice were examined. Without infection, WT and SLPI-/- mice do not develop spontaneous arthritis. Due to the slow breeding of the SLPI-/- mice, a small number of uninfected control animals were used. All the supporting data values are provided in the supplemental excel.

      (2) In Figure 3, for macrophage population analysis, maybe consider implementing Ly6G-negative gating strategy to prevent neutrophil contamination in macrophage population?

      We appreciate reviewer’s suggestion. We have analyzed the data using the Ly6G-negative gating strategy and provided the result in the Supplemental figure 1. The two gating strategies showed consistent result, significantly higher percentage of infiltrating macrophages in the tibiotarsal tissue from infected SLPI-/- mice, line 154-158, line 726-729.

      Reviewer #2 (Recommendations for the authors):

      (1) The investigators should address the possibility that much of the enhanced inflammatory features of infected SLPI-deficient mice are simply due to the higher bacterial load in the joint.

      We agree with the reviewer’s viewpoint that the increased infection burden in the tibiotarsal tissue of the infected SLPI-/- mice could contribute to the enhanced inflammation. A brief discussion of this possibility has been added in the revised manuscript, line 287-288.

      (2) Fig. 1. (A) There is no statistically significant difference in the bacterial load in the heart or skin, in contrast to the tibiotarsal joint. It would be of interest to know whether other tissues that are routinely sampled to assess the bacterial load, such as injection site, knee, and bladder, also harbored increased bacterial load in SLPI-deficient mice. (B) Heart and joint burden were measured at "21-28" days. The two time points should be analyzed separately rather than pooled.

      (A) We appreciate the reviewer’s suggestion. We agree that looking into the infection load in other tissues is helpful. However, studies into murine Lyme arthritis have been predominantly focused on tibiotarsal tissue, which displays the most consistent and prominent swelling that’s easy to observe and measure. Thus, we focused on the tibiotarsal joint in our study. (B) We collected the heart and joint tissue approximately 3-week post infection within a 3-day window based on the feasibility and logistics of the laboratory. Using “21-28 d”, we meant to describe between 21 to 24 days post infection. We apologize for the mislabeling and it has been corrected it in the revised manuscript. In the methods, we defined the timeframe as “Mice were euthanized approximately 3-week post infection within a 3-day window (between 21 to 24 dpi) based on the feasibility and logistics of the laboratory”, line 464-466. In the results and figure legend, we corrected it as “between 21 to 24 dpi”.

      (3) Fig. 2. (A) The same ambiguity as to the days post-infection as cited above in Point 2B exists in this figure. (B) Panel B: Caliper measurements to assess joint swelling should be utilized rather than visual scoring. (In addition, the legend should make clear that the black circles represent mock-infected mice.)

      (A) The histology scoring, and histopathology examination were performed at the same time as heart and joint tissue collection, approximately 3 weeks post infection within a 3-day window based on the feasibility and logistics of the laboratory. We apologize for the mislabeling and it has been corrected in the revised manuscript. (B) We appreciate the reviewer’s suggestion. However, our extensive experience is that caliper measurement can alter the assessment of swelling by placing pressure on the joints and did not produce consistent results. Double blinded scoring was thus performed. Histopathology examination was performed by an independent pathologist and confirmed the histology score and provided additional measurements.

      (4) Fig. 3. (A) See Point 2B. (B) For Panels C-E, uninfected controls are lacking.

      We apologize for this omission. Uninfected controls have been provided in Figure 3 in the revised manuscript.

      (5) Fig. 4. Fig. 4. Some LD subjects were sampled multiple times (5 samples from 3 subjects with Lyme arthritis; 13 samples from 4 subjects with EM), and samples from same individuals apparently are treated as biological replicates in the statistical analysis. In contrast, the 5 healthy controls were each sampled only once.

      We agree with the reviewer that the control group is smaller than the patient group. Among the archived samples that are available, the number of adult healthy controls are limited, and sampled once. We used these samples to establish the baseline level of SLPI in the serum. It has been shown that the serum level of SLPI in healthy volunteers is in average about 40 ng/ml  (DOI: 10.3389/fimmu.2019.00664 and 10.1097/00003246-200005000-00003). The median level in the healthy control in our data was 38.92 ng/ml, which is comparable to the previous results. A brief discussion has been added in the revised manuscript, line 364-369.

      (6) Fig. 5. (A) Panel A: does binding occur when intact bacteria are used? (B) Panels B, C: Were bacteria probed with PI to indicate binding likely to occur to surface? How many biological replicates were performed for each panel? Is "antibody control" a no SLPI control? What is the blue line?

      Actively growing B. burgdorferi were collected and used for binding assays. We do not permeabilize the bacteria for flow cytometry. Thus, all the binding detected occurs to the bacterial surface. Three biological replicates were performed for each panel. The antibody control is no SLPI control. For panel D, the bacteria were stained with Hoechst, which shows the morphology of bacteria. We apologize for the missing information. A complete and detailed description of Figure 5 has been provided in both methods and figure legend in the revised manuscript. 

      (7) Sup Fig. 1. (A) Panel A: Was this experiment performed multiple times? I.e., how many biological replicates? (B) Panel B: Strain should be specified.

      The binding assay to B. burgdorferi B31A was performed two times. In panel B, B. burgdorferi B31A3 was used. We apologize for the missing information. A complete and detailed description has been provided in the figure legend in the revised manuscript. 

      (8) Fig. S2. It is not clear that the condition (20% serum) has any bactericidal activity, so the potential protective activity of SLPI cannot be determined. (Typical serum killing assays in the absence of specific antibody utilized 40% serum.)

      In Fig. S2, panel B, the first two bars (without SLPI, with 20% WT anti serum) showed around 40% viability. It indicates that the 20% WT anti serum has bactericidal activity. Serum was collected from B. burgdorferi-infected WT mice at 21 dpi, which should contain polyclonal antibody against B. burgdorferi.

      Reviewer #3 (Recommendations for the authors):

      It was a pleasure to review! I congratulate the authors on this elegant study. I think the manuscript is very well-written and clearly conveys the research outcomes. I only have minor suggestions to improve the readability of the text.

      We greatly appreciate the reviewer’s recognition of our work.

      Line 92: Please briefly summarize the key results of the study at the end of the introduction section.

      We appreciate the reviewer’s suggestion. A brief summary has been added in the revised manuscript, line 93-103.

      Line 108: Why is the inflammation significantly occurred only in ankle joints of SLPI-I mice? Could you please provide a brief explanation?

      The inflammation may also happen in other joints the B. burgdorferi infected SLPI-/- mice, which has not been studied. The study into murine Lyme arthritis has been predominantly done in the tibiotarsal tissue, which displays the most prominent swelling that’s easy to observe and measure. Thus, we focused on the tibiotarsal joint in our study.

      Line 136: Please also include the gene names in Figure 3.

      We apologize for the omission. Gene names has been included in figure legend in the revised manuscript.

      Line 181: Please briefly introduce BASEHIT. Why did you use this tool? What are the benefits?

      We appreciate the reviewer’s suggestion. We have provided a brief introduction on BASEHIT in the revised manuscript, line 216-218.

    1. eLife Assessment

      This study presents valuable findings with practical and theoretical implications for drug discovery, particularly in the context of repurposing cipargamin CIP for the treatment of Babesia spp. The evidence is solid with the methods, data and analyses broadly supporting the claims. The paper will be of great interest to scientists in drug discovery, computational biology, and microbiology

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, authors have tried to repurpose cipargamin (CIP), a known drug against Plasmodium and Toxoplasma against Babesia. They proved the efficacy of CIP on Babesia in nanomolar range. In silico analyses revealed the drug resistance mechanism through a single amino acid mutation at amino acid position 921 on the ATP4 gene of Babesia. Overall, the conclusions drawn by the authors are well justified by their data. I believe this study opens up a novel therapeutic strategy against babesiosis.

      Strengths:

      Authors have carried out a comprehensive study. All the experiments performed were carried out methodically and logically.

    3. Reviewer #3 (Public review):

      Summary:

      The authors aim to establish that cipargamin can be used for the treatment of infection caused by Babesia organisms.

      Strengths:

      The study provides strong evidence that cipargamin is effective against various Babesia species. In vitro growth assays were used to establish that cipargamin is effective against Babesia bovis and Babesia gibsoni. Infection of mice with Babesia microti demonstrated that cipargamin is as effective as the combination of atovaquone plus azithromycin. Cipargamin protected mice from lethal infection with Babesia rodhaini. Mutations that confer resistance to cipargamin were identified in the gene encoding ATP4, a P-type Na ATPase that is found in other apicomplexan parasites, thereby validating ATP4 as the target of cipargamin. A 7-day treatment of cipagarmin, when combined with a single dose of tafenoquine, was sufficient to eradicate Babesia microti in a mouse model of severe babesiosis caused by lack of adaptive immunity.

      Weaknesses:

      Cipargamin was tested in vivo at a single dose administered daily for 7 days. Despite the prospect of using cipargamin for the treatment of human babesiosis, there was no attempt to identify the lowest dose of cipagarmin that protects mice from Babesia microti infection. In the SCID mouse model, cipargamin was tested in combination with tafenoquine but not with atovaquone and/or azithromycin, although the latter combination is often used as first-line therapy for human babesiosis caused by Babesia microti.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors address an important issue in Babesia research by repurposing cipargamin (CIP) as a potential therapeutic against selective Babesia spp. In this study, CIP demonstrated potent in vitro inhibition of B. bovis and B. gibsoni with IC<sub>50</sub> values of 20.2 ± 1.4 nM and 69.4 ± 2.2 nM, respectively, and the in vivo efficacy against Babesia spp. using mouse model. The authors identified two key resistance mutations in the BgATP4 gene (BgATP4<sup>L921I</sup> and BgATP4<sup>L921V</sup>) and explored their implications through phenotypic characterization of the parasite using cell biological experiments, complemented by in silico analysis. Overall, the findings are promising and could significantly advance Babesia treatment strategies.

      Strengths:

      In this manuscript, the authors effectively repurpose cipargamin (CIP) as a potential treatment for Babesia spp. They provide compelling in vitro and in vivo data showing strong efficacy. Key resistance mutations in the BgATP4 gene are identified and analyzed through both phenotypic and in silico methods, offering valuable insights for advancing treatment strategies.

      Thank you for your insightful comments and for taking the time to review our manuscript.

      Weaknesses:

      The manuscript explores important aspects of drug repurposing and rational drug design using cipargamin (CIP) against Babesia. However, several weaknesses should be addressed. The study lacks novelty as similar research on cipargamin has been conducted, and the experimental design could be improved. The rationale for choosing CIP over other ATP4-targeting compounds is not well-explained. Validation of mutations relies heavily on in silico predictions without sufficient experimental support. The Ion Transport Assay has limitations and would benefit from additional assays like Radiolabeled Ion Flux and Electrophysiological Assays. Also, the study lacks appropriate control drugs and detailed functional characterization. Further clarity on mutation percentages, additional safety testing, and exploration of cross-resistance would strengthen the findings.

      We appreciate your feedback and for giving us the chance to improve our paper. We have specified how we revised the below comments one by one. I hope these address your concerns.

      Comment 1: It is commendable to explore drug repurposing, drug deprescribing, drug repositioning, and rational drug design, especially using established ATP4 inhibitors that are well-studied in Plasmodium and other protozoan parasites. While the study provides some interesting findings, it appears to lack novelty, as similar investigations of cipargamin on other protozoan parasites have been conducted. The study does not introduce new concepts, and the experimental design could benefit from refinement to strengthen the results. Additionally, the rationale for choosing CIP over other MMV compounds targeting ATP4 is not clearly articulated. Clarifying the specific advantages CIP may offer against Babesia would be beneficial. Finally, the validation of the identified mutations might be strengthened by additional experimental support, as reliance on in silico predictions alone may not fully address the functional impact, particularly given the potential ambiguity of the mutations (BgATP4 L to V and I).

      Thank you for your thoughtful feedback. We have addressed the concerns as follows: (1) Introduction of new concepts and experimental design: While our study primarily builds on existing frameworks, it provides novel insights into the interaction of CIP with Babesia parasites, which we believe contribute to the field. Regarding the experimental design, we acknowledge its limitations and have revised the manuscript to include additional experiments to strengthen the robustness of our findings. Specifically, we have added experiments on the detection of BgATP4-associated ATPase activity (Figure 3H), the evaluation of cross-resistance to antibabesial agents (Figures 5A and 5B), and the efficacy of CIP plus TQ combination in eliminating B. microti infection with no recrudescence in SCID mice (Figure 5C).

      (2) Rationale for choosing CIP over other MMV compounds targeting ATP4: We appreciate this point and have expanded the introduction section to articulate our rationale for selecting CIP (Lines 94-97). Specifically, CIP was chosen due to its previously demonstrated efficacy against Plasmodium and other protozoan parasites.

      (3) Validation of identified mutations: We agree that additional experimental data would strengthen the validation of the identified mutations. In response, we have indicated the ratio of wild-type to mutant parasites by Illumina NovaSeq6000 to validate the impact of the BgATP4 C-to-G and A mutations (Figure 2D).

      Comment 2: Conducting an Ion Transport Assay is useful but has limitations. Non-specific binding or transport by other cellular components can lead to inaccurate results, causing false positives or negatives and making data interpretation difficult. Indirect measurements, like changes in fluorescence or electrical potential, can introduce artifacts. To improve accuracy, consider additional assays such as

      a. Radiolabeled Ion Flux Assay: tracks the movement of Na<sup>+</sup> using radiolabeled ions, providing direct evidence of ion transport.

      b. Electrophysiological Assay: measures ionic currents in real-time with patch-clamp techniques, offering detailed information about ATP4 activity.

      Thank you for highlighting the limitations of the ion transport assay and suggesting alternative approaches to improve accuracy. However, they require specialized equipment and expertise not currently available in our laboratory. We have acknowledged these limitations and included these alternative methods as part of the study's future directions. Thank you for your suggestions which will undoubtedly enhance the rigor and depth of our research.

      Comment 3: In-silico predictions can provide plausible outcomes, but it is essential to evaluate how the recombinant purified protein and ligand interact and function at physiological levels. This aspect is currently missing and should be included. For example, incorporating immunoprecipitation and ATPase activity assays with both wild-type and mutant proteins, as well as detailed kinetic studies with Cipargamin, would be recommended to validate the findings of the study.

      Thank you for your insightful suggestions regarding the validation of in-silico predictions. We recognize the importance of evaluating the interaction and function of recombinant purified proteins and ligands at physiological levels to strengthen the study's findings. (1) Incorporating experimental validation:

      a. Immunoprecipitation assays: We agree that immunoprecipitation could provide valuable evidence of protein-ligand interactions. While this was not included in the current study due to limitations in sample availability, we plan to incorporate this assay in follow-up experiments.

      b. ATPase activity assays: Assessing ATPase activity in both wild-type and mutant proteins is a crucial step in validating the functional impact of the identified mutations. We included the results in the revised manuscript (Figure 3H).

      (2) Detailed kinetic studies with cipargamin: We appreciate the recommendation to conduct detailed kinetic analyses. These studies would provide deeper insights into the binding affinity and inhibition dynamics of cipargamin. We have included the results of these experiments in the current study (Figure 3I).

      Comment 4: The study lacks specific suitable control drugs tested both in vitro and in vivo. For accurate drug assessment, especially when evaluating drugs based on a specific phenotype, such as enlarged parasites, it is important to use ATP4 gene-specific inhibitors. Including similar classes of drugs, such as Aminopyrazoles, Dihydroisoquinolines, Pyrazoleamides, Pantothenamides, Imidazolopiperazines (e.g., GNF179), and Bicyclic Azetidine Compounds, would provide more comprehensive validation.

      Thank you for emphasizing the importance of including suitable control drugs. We acknowledge the absence of specific control drugs in the previous version of the manuscript. To date, no drug targeting ATP4 proteins in Babesia has been definitively identified. The suggested drugs could potentially disrupt the parasite's ability to regulate sodium levels by inhibiting PfATP4, a protein essential for its survival. This highlights PfATP4 as an attractive target for antimalarial drug development. However, further studies are required to evaluate whether these drugs exhibit similar activity against ATP4 homologs in Babesia.

      Comment 5: Functional characterization of CIP through microscopic examination and quantification for assessing parasite size enlargement is not entirely reliable. A Flow Cytometry-Based Assay is recommended instead 9 along with suitable control antiparasitic drugs). To effectively monitor Cipargamin's action, conducting time-course experiments with 6-hour intervals is advisable rather than relying solely on endpoint measurements. Additionally, for accurate assessment of parasite morphology, obtaining representative qualitative images using Scanning Electron Microscopy (SEM) or Transmission Electron Microscopy (TEM) for treated versus untreated samples is recommended for precise measurements.

      Thank you for your constructive feedback regarding the methods for functional characterization of CIP and the evaluation of parasite morphology.

      (1) Flow Cytometry-Based Assay: We agree that a flow cytometry-based assay would enhance the accuracy of detecting changes in parasite size and morphology. We will implement this method in future studies as our laboratory currently does not have the capability to conduct such experiments.

      (2) Microscopy for Morphology Assessment: We acknowledge the importance of obtaining high-resolution, representative images of treated and untreated samples. Utilizing Scanning Electron Microscopy (SEM) or Transmission Electron Microscopy (TEM) for qualitative analysis will significantly improve the precision of our morphological assessments. However, both methods have limitations.

      a. SEM: This technique can only scan the erythrocytes' surface; it cannot scan the parasite itself because it is inside the erythrocytes.

      b. TEM: Since the parasite is fixed, observations from various angles may reveal longitudinal or cross-sectional portions, making it impossible to precisely view the parasite's dimensions. As a result, we employed TEM to precisely observe the parasite's internal structure alterations both before and after treatment, as seen in Figure 3C.

      Comment 6: A notable contradiction observed is that mutant cells displayed reduced efficacy and affinity but more pronounced phenotypic effects. The BgATP4<sup>L921I</sup> mutation shows a 2x lower susceptibility (IC<sub>50</sub> of 887.9 ± 61.97 nM) and a predicted binding affinity of -6.26 kcal/mol with CIP. However, the phenotype exhibits significantly lower Na<sup>+</sup> concentration in BgATP4<sup>L921I</sup> (P = 0.0087) (Figure 3E).

      The seemingly contradicting observation of reduced CIP binding and efficacy in the BgATP4<sup>L921I</sup> mutant with a significant decrease in intracellular Na<sup>+</sup> concentration may be explained by factors other than the direct CIP interaction. Logically, we consider that CIP binds less effectively to its target in the BgATP4<sup>L921I</sup> mutant, but the observed phenotype may be attributed to the functional consequences of the mutation. The BgATP4<sup>L921I</sup> mutation probably directly impacts the function of BgATP4's ion transport mechanism, which likely disrupts Na<sup>+</sup> homeostasis independently. Thus, we hypothesize that the dysregulated Na<sup>+</sup> homeostasis is driven by the mutation itself rather than the already weakened inhibitory effect of CIP.

      Comment 7: The manuscript does not clarify the percentage of mutations, and the number of sequence iterations performed on the ATP4 gene. It is also unclear whether clonal selection was carried out on the resistant population. If mutations are not present in 100% of the resistant parasites, please indicate the ratio of wild-type to mutant parasites and represent this information in the figure, along with the chromatograms.

      Thank you for your valuable comments. We appreciate your detailed observations and giving us the opportunity to clarify these points. During the long-term culture process, subculturing was performed every three days. Although clonal selection was not conducted, mutant strains were effectively selected during this process. Using the Illumina NovaSeq6000 sequencing platform, high-throughput next-generation sequencing was performed to detect ratio of wild-type to mutant parasites. Results showed that for BgATP4<sup>L921V</sup>, 99.97% of 7,960 reads were G, and for BgATP4<sup>L921I</sup>, 99.92% of 7,862 reads were A. To enhance clarity, we have included a new figure (Figure 2D) illustrating the sequencing results. We believe this addition will help provide a clearer understanding for the readers.

      Comment 8: While the compound's toxicity data is well-established, it is advisable to include additional testing in epithelial cells and liver-specific cell lines (e.g., HeLa, HCT, HepG2) if feasible for the authors. This would provide a more comprehensive assessment of the compound's safety profile.

      Thank you for your thoughtful suggestion. We included toxicity testing in human foreskin fibroblasts (HFF) as supplemental toxicity data to provide a more comprehensive evaluation of the compound's safety profile (Figure supplement 1B).

      Comment 9: In the in vivo efficacy study, recrudescent parasites emerged after 8 days of treatment. Did these parasites harbor the same mutation in the ATP4 gene? The authors did not investigate this aspect, which is crucial for understanding the basis of recrudescence.

      Thank you for raising this important point. We acknowledge that understanding the genetic basis of recrudescence is critical for elucidating mechanisms of resistance and treatment failure. Although our current study did not include an analysis of the BrATP4 gene in relapse parasites due to limitations in sample availability, we evaluated CIP efficacy in SCID mice and performed sequencing analysis of the BmATP4 gene in recrudescent samples. However, no mutation points were identified (Lines 211-212). We believe that if a relapse occurs after the 7-day treatment, it is unlikely that the parasites would easily acquire mutations.  

      Comment 10: The authors should explain their choice of BABL/c mice for evaluating CIP efficacy, as these mice clear the infection and may not fully represent the compound's effectiveness. Investigating CIP efficacy in SCID mice would be valuable, as they provide a more reliable model and eliminate the influence of the immune system. The rationale for not using SCID mice should be clarified.

      We appreciate the reviewer's suggestion regarding the use of SCID mice to evaluate the efficacy of CIP. In response to your suggestion, we have now included an experiment using SCID mice to evaluate the efficacy of CIP and to eliminate the confounding influence of the immune system. We further investigated the potential of combined administration of CIP plus TQ to eliminate parasites, as we are concerned that the long-term use of CIP as a monotherapy may be limited due to its potential for developing resistance. The results are shown in Figure 5C.

      Comment 11: Do the in vitro-resistant parasites show any potential for cross-resistance with commonly used antiparasitic drugs? Have the authors considered this possibility, and what are their expectations regarding cross-resistance?

      Thank you for your insightful question regarding the potential for cross-resistance between in vitro-resistant parasites and commonly used antiparasitic drugs. In response to your suggestion, we have now included experiments to assess whether B. gibsoni parasites that are resistant to CIP exhibit any cross-resistance to other commonly used antiparasitic drugs, such as atovaquone (ATO) and tafenoquine (TQ). The IC<sub>50</sub> values for both ATO and TQ in the resistant strains showed only slight changes compared to the wild-type strain, with less than a onefold difference (Figure 5A, 5B). This minimal variation suggests that the resistant strain has a mild alteration in susceptibility to ATO and TQ, but not enough to indicate strong resistance or significant cross-resistance. This suggests that CIP could be used in combination with TQ to treat babesiosis.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors have tried to repurpose cipargamin (CIP), a known drug against plasmodium and toxoplasma against babesia. They proved the efficacy of CIP on babesia in the nanomolar range. In silico analyses revealed the drug resistance mechanism through a single amino acid mutation at amino acid position 921 on the ATP4 gene of Babesia. Overall, the conclusions drawn by the authors are well justified by their data. I believe this study opens up a novel therapeutic strategy against babesiosis.

      Strengths:

      The authors have carried out a comprehensive study. All the experiments performed were carried out methodically and logically.

      Thank you for the comments and your time to review our manuscript.

      Weaknesses:

      The introduction section needs to be more informative. The authors are investigating the binding of CIP to the ATP4 gene, but they did not give any information about the gene or how the ATP4 inhibitors work in general. The resolution of the figures is not good and the font size is too small to read properly. I also have several minor concerns which have been addressed in the "Recommendations for the authors" section.

      We thank the reviewer for their valuable comments. In response, we have revised the introduction to include a more detailed explanation of the ATP4 gene, its biological significance, and the mechanism of ATP4 inhibitors to provide a better context of the study (Lines 86-93). Additionally, we have reformatted the figures to enhance resolution and increased the font size to ensure improved readability. We also appreciate the reviewer's careful assessment of the manuscript and have addressed all minor concerns outlined in the "Recommendations for the Authors" section. A detailed, point-by-point response to each concern is provided in the response letter, and the corresponding revisions have been incorporated into the manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to establish that cipargamin can be used for the treatment of infection caused by Babesia organisms.

      Strengths:

      The study provides strong evidence that cipargamin is effective against various Babesia species. In vitro, growth assays were used to establish that cipargamin is effective against Babesia bovis and Babesia gibsoni. Infection of mice with Babesia microti demonstrated that cipargamin is as effective as the combination of atovaquone plus azithromycin. Cipargamin protected mice from lethal infection with Babesia rodhaini. Mutations that confer resistance to cipargamin were identified in the gene encoding ATP4, a P-type Na<sup>+</sup> ATPase that was found in other apicomplexan parasites, thereby validating ATP4 as the target of cipargamin.

      We appreciate the reviewer for taking the time to review our manuscript.

      Weaknesses:

      Cipargamin was tested in vivo at a single dose administered daily for 7 days. Despite the prospect of using cipargamin for the treatment of human babesiosis, there was no attempt to identify the lowest dose of cipagarmin that protects mice from Babesia microti infection. Exposure to cipargamin can induce resistance, indicating that cipargamin should not be used alone but in combination with other drugs. There was no attempt at testing cipargamin in combination with other drugs, particularly atovaquone, in the mouse model of Babesia microti infection. Given the difficulty in treating immunocompromised patients infected with Babesia microti, it would have been informative to test cipargamin in a mouse model of severe immunosuppression (SCID or rag-deficient mice).

      We thank the reviewer for raising these important comments. We address each concern as follows:

      (1) Identifying the lowest protective dose of CIP:

      Although our current study was designed to assess the efficacy of CIP at a single therapeutic dose over a 7-day period, we acknowledge that identifying the lowest effective dose would provide valuable information for optimizing treatment regimens. We plan to address this in future studies by conducting a dose-response experiment to identify the minimal protective dose of CIP.

      (2) Testing CIP in combination with other drugs:

      In the current study, we have tested the efficacy of tafenoquine (TQ) combined with CIP, as well as CIP or TQ administered individually, in a mouse model of B. microti infection. Our results demonstrated that, compared with monotherapy, the combination of CIP and TQ completely eliminated the parasites within 90 days of observation (Figure 5C).

      (3) Testing in an immunocompromised mouse model:

      We agree with the reviewer that evaluating CIP in immunocompromised models is critical for understanding its potential in treating immunocompromised patients. To address this, we have conducted experiments using SCID mice infected with B. microti. Our results indicated that the combination therapy of CIP plus TQ was effective in eliminating parasites in the severely immunocompromised model (Figure 5D).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1: Table: Include the in-silico binding energies for each mutation and ligand.

      We have added binding energies for each mutation and ligand in Table supplement 3.

      Comment 2: Did the authors investigate the potential of combination therapies involving CIP?

      We have tested the efficacy of TQ combined with CIP in a mouse model of B. microti infection.

      Comment 3: Does this mutation affect the transmission of the parasite?

      Based on our observations, the growth and generation rates of the mutant strain are comparable to those of the wild-type strain. These findings suggest that the mutation does not significantly affect the spread or transmission of the parasite. We have included this observation in the revised manuscript (Lines 243-244).

      Comment 4: 60: Use abbreviations CLN for clindamycin and QUI for quinine.

      We have revised them accordingly (Lines 59-60).

      Comment 5: 86: The hypothesis is not strong or convincing; it needs to be modified to be more specific and convincing.

      We have revised the hypothesis to reflect the rationale behind the study better and to support our claim more strongly (Lines 94-97).

      Comment 6: 93: Change to: "In vitro efficacy of CIP against B. bovis and B. gibsoni.".

      We have changed the suggested content in the manuscript (Line 104).

      Comment 7: 96: Define CC<sub>50</sub>.

      We have added the definition of CC<sub>50</sub> (Line 106).

      Comment 8: 102: Change to: "...Balb/c mice increased dramatically in the...".

      We have changed the word following your recommendation (Line 114).

      Comment 9: 108: "...significant decrease at 12 DPI...".

      We have revised it according to your suggestion (Line 120).

      Comment 10: 110: "This indicates that the administration...".

      We have revised it according to your suggestion (Line 122).

      Comment 11: Figure 1:

      (1) Panels A and B should clearly indicate parasite species within the graph for better self-explanation.

      We have indicated parasite species within the graph.

      (2) For panels C, D, and E, if mice were eliminated or euthanized in the study, include a symbol in the graph to indicate this.

      For panels C and D, no mice were eliminated during the study; therefore, no symbol was added to these graphs. Panel F already provides information about the number of eliminated mice, which corresponds to the data in Panel E.

      (3) In panels C, D, and E, use a continuation arrow for drug treatment rather than a straight line, to cover the duration of the treatment.

      We have updated the figures to use continuation arrows instead of straight lines to represent the duration of drug treatment.

      Comment 12: Figure 2: The color combination for the WT and mutant curves is hard to read; consider using regular, less fluorescent, and more distinguishable colors.

      We have adjusted the color scheme to use more distinguishable and less fluorescent colors, ensuring better readability and clarity. The revised figure with the updated color scheme has been included in the updated manuscript, and we hope this resolves the readability concern.

      Comment 13: Figure 3:

      (1) Panel A: Represent a single infected iRBC rather than a field for better visualization.

      We have updated Panel A to display a single infected iRBC instead of a field.

      (2) Panels E and F: Change the color patterns, as the current colors, especially the green variants (WT and mutant L921V), are difficult to read.

      To improve readability, we have updated the color patterns for these panels by selecting more distinguishable colors with higher contrast (Figure 3F, 3G).

      Comment 14: Figure 4: Panels B, C, and D: The text is too small to read; increase the font size or change the resolution.

      We have increased the font size and replaced the panels with high-resolution versions (Figure 4B, 4C, 4D).

      Reviewer #2 (Recommendations for the authors):

      Comment 1: In the last paragraph of the introduction, the authors mentioned determining the activity of CIP in vitro in B. bovis and B. gibsoni while in vivo in B. microti and B. rodhaini. It is not explained why they are testing the in vitro and in vivo effects on different Babesia species. Could you please add some logic there? Also, why did they mention measuring the inhibitory activity of CIP by monitoring the Na<sup>+</sup> and H<sup>+</sup> balance? This part needs to be rewritten with more information. The ATP4 gene is not properly introduced in the manuscript.

      We thank the reviewer for raising these important points. Below, we address each aspect of the comment in detail:

      (1) Rationale for testing different Babesia spp. in vitro and in vivo:

      B. bovis and B. gibsoni are well-established Babesia models for in vitro culture systems, allowing evaluation of CIP's inhibitory activity under controlled laboratory conditions. B. microti and B. rodhaini, on the other hand, are commonly used rodent models for the in vivo studies of babesiosis, enabling the assessment of drug efficacy in a mammalian host system. This multi-species approach provides a comprehensive evaluation of CIP's efficacy across Babesia spp. with different biological characteristics.

      (2) Measuring CIP's inhibitory activity via Na<sup>+</sup> and H<sup>+</sup> balance:

      We acknowledge that this section of the introduction requires more context. The revised manuscript now includes additional information explaining that the ATP4 gene, which encodes a Na<sup>+</sup>/H<sup>+</sup> transporter, is the proposed target of CIP (Lines 86-93). CIP disrupts the ion homeostasis maintained by ATP4, leading to an imbalance in Na<sup>+</sup> and H<sup>+</sup> concentrations. Monitoring these ionic changes provides a mechanistic understanding of CIP's mode of action and its impact on parasite viability. This rationale has been expanded in the introduction to clarify its significance.

      Comment 2: The figure fonts are too small. The resolution for the images is also poor.

      We have increased the font size in all figures to improve readability. Additionally, we have replaced the figures with high-resolution versions to ensure clarity and visual quality.

      Comment 3: Figures 1A and 1B: one of the error bars merged to the X-axis legend. Please modify these panels. Which curve was used to determine the IC<sub>50</sub> values (although it's mentioned in the methods section, would it be better to have the information in the figure legends as well)?

      We thank the reviewer for their comments regarding Figures 1A and 1B.

      (1) Error bars overlapping the X-axis legend:

      The error bars in the figures were automatically generated using GraphPad Prism9 based on the data and are determined by the values themselves. Unfortunately, this overlap cannot be avoided without altering the data representation.

      (2) IC<sub>50</sub> curve information:

      To clarify the determination of IC<sub>50</sub> values, we have already included gray dashed lines in the graphs to indicate where the IC<sub>50</sub> values were derived from the curves. This visual representation provides clear information about the IC<sub>50</sub> points.

      Comment 4: Supplementary Figure 1: what are MDCK cells? What is CC<sub>50</sub>? Please mention their full forms in the text and figure legends (they should be described here because the methods section comes later). What is meant by a predicted selectivity index? There should be an explanation of why and how they did it. Which curve was used to determine the IC<sub>50</sub> values?

      We thank the reviewer for pointing out the need to clarify terms and provide additional context in the supplementary figure and text. We have updated the figure legend and text to include the full forms of MDCK (Madin-Darby canine kidney) cells and CC<sub>50</sub> (50% cytotoxic concentration), ensuring clarity for readers encountering these terms for the first time. In text, now we have included a brief explanation of the selectivity index as a measure of a drug's safety and specificity (Lines 108-110). The selectivity index is calculated as the ratio between the half maximal inhibitory concentration (IC<sub>50</sub>) and the 50% cytotoxic concentration (CC<sub>50</sub>) values (Lines 333-335). We also have already included gray dashed lines in the graphs to indicate where the IC<sub>50</sub> values were derived from the curves (Figure supplement 1).

      Comment 5: Figures 1C-F: It feels unnecessary to write down n=6 for each panel and each group. Since "n" is equal for all, it would be nice to just mention it in the figure legend only.

      We appreciate the reviewer's suggestion regarding the notation of "n=6" in Figures 1C-F. To improve clarity and reduce redundancy, we have removed the "n=6" notation from the individual panels and included it in the figure legend instead.

      Comment 6: Figure 2A: was never mentioned in the text.

      We have described the sequencing results for the wild-type B. gibsoni ATP4 gene with a reference to Figure 2A in the revised manuscript (Lines 134-135).

      Comment 7: Figure 2D: some of the error bars merged to the X-axis legend. Please modify. Again, which curve was used to determine the IC<sub>50</sub> values? Can the authors explain why the pH declined after 4 minutes?

      We thank the reviewer for this insightful question.

      (1) Error bars overlapping the X-axis legend:

      The error bars in Figure 2E were automatically generated using GraphPad Prism9 and are determined by the underlying data values. Unfortunately, this overlap cannot be avoided without altering the data representation.

      (2) IC<sub>50</sub> curve information:

      Since Figure 2E contains three separate curves, adding dashed lines to indicate the IC<sub>50</sub> for each curve would make the figure overly cluttered and reduce readability. To address this, we have clearly indicated the IC<sub>50</sub> values in Figures 1A and 1B and described the methodology for determining IC<sub>50</sub> values in the Methods section. We believe this approach provides sufficient clarity without compromising the visual experience of Figure 2E.

      (3) The pH decline observed after 4 minutes (Figure 3E) may be attributed to the following factors:

      a. Ion transport dynamics:

      The initial rise in pH likely reflects the rapid inhibition of Na<sup>+</sup>/H<sup>+</sup> exchange mediated by CIP, which temporarily alkalinizes the intracellular environment. However, after this initial phase, compensatory mechanisms, such as proton influx or metabolic acid production, may lead to a subsequent decline in pH.

      b. Drug kinetics and target interaction:

      The decline could also result from the time-dependent effects of CIP on ATP4-mediated ion transport. As the drug action stabilizes, the parasite may partially restore ionic balance, leading to a decrease in intracellular pH.

      Comment 8: Supplementary Figure 2: It's difficult to distinguish between red and pink colors, so it would be wise to use two contrasting colors to distinguish between Pf and Tg CIP resistant cites.

      We have updated the figure to enhance clarity. Purple squares and arrows now represent sites linked to P. falciparum CIP resistance, replacing the previous red squares. Similarly, gray squares and arrows have replaced the green squares to denote sites associated with T. gondii (Figure supplement 2).

      Comment 9: Line 65: Is it possible to add a reference here?

      We have added a reference in line 65.

      Comment 10: Line 69: Please spell the full form of G6PD as it was never mentioned before.

      We have added the full form of G6PD in lines 69-70.

      Comment 11: Line 103: mention what DPI is (irrespective of the methods section which comes later).

      We have spelled out DPI (days postinfection) in line 115.

      Comment 12: Line 120: It's not explained why B. gibsoni ATP4 gene was investigated? There should be more explanation and references to previous work.

      We thank the reviewer for pointing out the need to provide more context for investigating the B. gibsoni ATP4 gene. To address this, we have added more information to the introduction, explaining that the ATP4 gene, which encodes a Na<sup>+</sup>/H<sup>+</sup> transporter, is the proposed target of CIP (Lines 86-93).

      Comment 13: Line 203-219: line spacing seems different from the rest of the manuscript.

      We have corrected the incorrect format (Lines 262-278).

      Reviewer #3 (Recommendations for the authors):

      Comment 1: Lines 66-68: The report by Marcos et al. 2022 did not demonstrate that tafenoquine was effective in curing relapsing babesiosis. In the discussion of that article, the authors state that "it is impossible to conclude that the drug tafenoquine provided any clinical benefit." The first demonstration of tafenoquine efficacy against relapsing babesiosis was reported by Rogers et al. 2023 and confirmed by Krause et al. 2024. Please rephrase the statement and use relevant citations.

      We thank the reviewer for pointing out this issue and we have rephrased the statement and used relevant citations (Lines 66-68).

      Comment 2: Line 103: mean parasitemia at 10 DPI is reported to be 35.88% but Figure 1C appears to indicate otherwise.

      We are sorry for the carelessness, the correct mean parasitemia at 10 DPI is 38.55%, and this has been updated in line 115 of the revised manuscript to reflect the data shown in Figure 1C.

      Comment 3: Line 116: parasitemia is said to recur on day 14 post-infection but Figure 1E indicates that recurrence was already noted on day 12 post-infection.

      We thank the reviewer for pointing out this inconsistency. We have corrected the relapse day to reflect that recurrence was noted on day 12 post-infection, as shown in Figure 1E. This correction has been made in the revised manuscript (Line 128).

      Comment 4: Line 120: Replace "wells" with "strains". Also, start the paragraph with one brief sentence to state how resistant parasites were generated.

      We have replaced "wells" with "strains" and added one brief sentence to explain how resistant parasites were generated (Lines 132-134).

      Comment 5: Line 169: is Ji et al, 2022b truly the appropriate reference to support a statement on tafenoquine?

      We thank the reviewer for highlighting this point. We have added one other reference to support a statement on tafenoquine. The IC<sub>50</sub> value of TQ was 20.0 ± 2.4 μM against B. gibsoni (Ji et al., 2022b), and 31 μM against B. bovis (Carvalho et al., 2020) (Lines 223-225).

      Comment 6: Lines 184-185: given that exposure to CIP induces mutations in the ATP4 gene and therefore resistance to CIP, what is the prospect of using CIP for the treatment of babesiosis? Can the authors speculate on whether CIP should not be used alone but rather in combination with other drugs currently used for the treatment of human babesiosis?

      We thank the reviewer for raising this important question. Given that exposure to CIP induces mutations in the ATP4 gene, leading to resistance, we acknowledge that the long-term use of CIP as a monotherapy may be limited due to the potential for resistance development. To address this concern, we investigated the combination therapy of TQ and CIP to achieve the complete elimination of B. microti in infected mice (a model for human babesiosis). The results of this study are presented in Figure 5C.

      Comment 7: Lines 258-259: it is stated that drug treatment was initiated on day 4 post-infection when mean parasitemia was 1% and that drug treatment was continued for 7 days. This is not the case for B. rodhaini infection. As reported in Figure 1E, treatment was initiated on day 2 post-infection.

      We apologize for the oversight and any confusion caused. We have corrected the statement to reflect that drug treatment for B. rodhaini-infected mice was initiated at 2 DPI, as reported in Figure 1E (Lines 347-349).

      Comment 8: Lines 282-285: RBCs are said to be exposed to CIP for 3 days but parasite size is said to be measured on day 4. Which is correct?

      We thank the reviewer for pointing out this discrepancy. To clarify, the infected erythrocytes were exposed to CIP for three consecutive days (72 hours). Blood smears were then prepared at the 73<sup>rd</sup> hour, corresponding to the fourth day.

      Comment 9: Lines 35-37: this sentence can be omitted from the abstract as it does not summarize additional insight or additional data.

      We have omitted this sentence from the abstract.

      Comment 10: Line 55: replace Drews et al. 2023 with Gray and Ogden 2021 (doi: 10.3390/pathogens10111430). This excellent article directly supports the statement made by the authors.

      We appreciate the reviewer's suggestion and have replaced the reference with Gray and Ogden, 2021 (doi: 10.3390/pathogens10111430) (Line 54).

      Comment 11: Line 55: modify the start of sentence to read "The disease is known as babesiosis ...".

      We have modified the sentence (Line 54).

      Comment 12: Line 56: rephrase to read ".... but chronic infections can be asymptomatic".

      We have modified the sentence (Line 55).

      Comment 13: Line 57: rephrase to read "The fatality rate ranges from 1% among all cases to 3% among hospitalized cases but has been as high as 20% in immunocompromised patients."

      We have rephrased the sentence (Lines 55-57).

      Comment 14: Line 61: replace Holbrook et al. 2023 with Krause et al. 2021 (doi: 10.1093/cid/ciaa1216).

      We have replaced Holbrook et al. 2023 with Krause et al. 2021 (doi: 10.1093/cid/ciaa1216) (Line 60).

      Comment 15: Line 62: rephrase to read "... cytochrome b, which is targeted by atovaquone, were identified in patients with relapsing babesiosis." Here, also cite Lemieux et al., 2016; Simon et al., 2017; Rosenblatt et al, 2021, Marcos et al., 2022; Rogers et al., 2023; Krause et al., 2024.

      We have rephrased the sentence and cited the suggested references (Lines 61-64).

      Comment 16: Line 65: rephrase "Despite its efficacy, this combination can elicit adverse drug reactions (Vannier and Krause, 2012)."

      We have rephrased the sentence (Lines 65-66).

      Comment 17: Lines 75-77: rephrase to read "... of the drug indicated that CIP taken orally had good absorption, a long half-life, and ...".

      We have rephrased the sentence (Lines 76-77).

      Comment 18: Line 79: remove "the".

      We have removed "the" (Lines 79-80).

      Comment 19: Lines 83-85: rephrase to read "Mice infected with T. gondii that were treated with CIP on the day of infection and the following day had 90% fewer parasites 5 days post-infection (Zhou et al., 2014).".

      We have rephrased the sentence (Lines 83-85).

      Comment 20: Line 90: shorten the sentence to end as follows "... of CIP on Babesia parasites.".

      We have shortened the sentence in line 100 with your suggestion.

      Comment 21: Line 96: spell out CC<sub>50</sub>.

      We have spelled out the full form of CC<sub>50</sub> (Line 106).

      Comment 22: Line 104: remove "of body weight".

      We have removed "of body weight" (Line 116).

      Comment 23: Line 108: delete "from 8 DPI to 24 DPI, with statistically significant decreases".

      We have deleted "from 8 DPI to 24 DPI, with statistically significant decreases" (Line 120).

      Comment 24: Line 111: start a new paragraph with the sentence "BALB/c mice infected ...".

      We have started a new paragraph with the sentence "BALB/c mice infected ..." (Line 124).

      Comment 25: Line 123: replace "showed" with "occurred".

      We have replaced "showed" with "occurred" (Line 138).

      Comment 26: Line 127: rephrase to read "... sensitivity of the resistant parasite lines ...".

      We have rephrased the sentence (Line 144).

      Comment 27: Lines 137-140: rephrase to read ".... lines were lower when compared with ..." .

      We have rephrased the sentence (Line 158).

      Comment 28: Line 149: replace "BgATP4" with "B. gibsoni ATP4".

      We have replaced "BgATP4" with "B. gibsoni ATP4" (Line 183).

      Comment 29: Line 154: spell out "pLDDT" prior to pLDDT.

      We have provided the full form of pLDDT in the revised manuscript (Line 188).

      Comment 30: Lines 165-166: rephrase to read "CIP is a novel compound that inhibits Plasmodium development by targeting ATP4 and has been ...".

      We have rephrased the sentence (Lines 219-220).

      Comment 31: Lines 171-172: rephrase to read "...AZI, the combination recommended by the CDC in the United States.

      We have rephrased the sentence (Lines 226-227).

      Comment 32: Line 173: rephrase to read "... B. rodhaini infection, with survival up to 67%.".

      We have rephrased the sentence (Line 228).

      Comment 33: Lines 175-178: rephrase to read "In a previous study, a P. falciparum Dd2 strain that acquired resistance to CIP carried the G358S mutation in the ...".

      We have rephrased the sentence (Lines 230-231).

      Comment 34: Lines 179-180: rephrase to read "ATP4 is found in the parasite plasma membrane and is specific to the subclass of apicomplexan parasites.".

      We have rephrased the sentence (Lines 232-233).

      Comment 35: Lines 182-184: rephrase to read "In another study of Toxoplasma gondii, a cell line that carried the mutation G419S in the TgATP4 gene was 34 times ...".

      We have rephrased the sentence (Lines 235-237).

      Comment 36: Lines 201-202: deleted the last sentence of this paragraph.

      We have deleted the last sentence of the paragraph (Line 261).

      Comment 37: Line 228: rephrase to read "... that CIP had a weaker binding to BgATP4<sup>L921I</sup> than to BgATP4<sup>L921V</sup>.".

      We have rephrased the sentence (Lines 294-295).

      Comment 38: Lines 261-262: please state that drugs were prepared in sesame oil. Add "20 mg/kg" in front of AZI.

      We have stated that drugs were prepared in sesame oil and added "20 mg/kg" in front of AZI (Lines 350-352).

      Comment 39: Line 265: replace "care" with "treatments".

      We have replaced "care" with "treatments" (Line 355).

      Comment 40: Line 267: replace "observe" with "assess".

      We have replaced "observe" with "assess" (Line 357).

      Comment 41: Lines 269-271: please provide the absolute numbers of B. gibsoni infected RBCs and the absolute numbers of uninfected RBCs that were added to the culture medium.

      We thank the reviewer for this suggestion. In the revised manuscript, we have included the absolute numbers of B. gibsoni-infected RBCs and uninfected RBCs added to the culture medium. Specifically, the culture medium contained 10 μL (5×10 <sup>6</sup>) B. gibsoni iRBCs mixed with 40 μL (4×10 <sup>8</sup>) uninfected RBCs (Lines 360-361).

      Comment 42: Line 279: replace "confirmed" with "identified".

      We have replaced "confirmed" with "identified" (Line 370).

      Comment 43: Figure Supplement 2: the squares are not readily visible. Could the entire column corresponding to the mutation position be highlighted?

      We thank the reviewer for this suggestion. To improve visibility, we have changed the color of the squares and added arrows to make the mutation sites as prominent as possible. Unfortunately, due to software limitations, we were unable to highlight the entire column corresponding to the mutation position.

      Comment 44: Figure Supplement 4: for the parasite that carries a mutation in BgATP4, please delete the arrows that are next to BgATP4. These arrows send the message that the mutation ATP4 has an active role in pumping back Na<sup>+</sup> and H<sup>+</sup> back in their compartment, which is not the case.

      We thank the reviewer for their observation. The dotted arrows next to BgATP4 are intended to indicate the recovery of H<sup>+</sup> and Na<sup>+</sup> balance facilitated by the mutated ATP4, which reduces susceptibility to ATP4 inhibitors. To avoid potential confusion, we have revised the figure legend to clearly explain the role of the arrows, ensuring the intended message is accurately conveyed.

    1. eLife Assessment

      This important study utilizes humanized mice, in which human immune cells are introduced into immune-deficient mice, to provide convincing evidence that two helper CD4 T-cell subsets, T-follicular helper (Tfh) and T-peripheral helper (Tph) cells, are able to drive both autoantibody production and induction of autoimmunity. The work will be of broad interest to medical scientists engaged in deciphering how human immune cells mediate immune responses and contribute to the development of autoimmune diseases.

    2. Reviewer #1 (Public review):

      Summary:

      As our understanding of the immune system increases it becomes clear that murine models of Immunity cannot always prove an accurate model system for human immunity. However, mechanistic studies in humans are necessarily limited. To bridge this gap many groups have worked on developing humanised mouse models in which human immune cells are introduced into mice allowing their fine manipulation. However, since human immune cells will attack murine tissues, it has proven complex to establish a human-like immune system in mice. To help address this Vecchione et al, have previously developed several models using human cell transfer into mice with or without human thymic fragments that allow negative selection of autoreactive cells. In this report they focus on the examination of the function of the B-helper CD4 T-cell subsets T-follicular helper (Tfh) and T-peripheral helper (Tph) cells. They demonstrate that these cells are able to drive both autoantibody production and can also induce B-cell independent autoimmunity.

      Strengths:

      A strength of this paper is that currently there is no well-established model for Tfh or Tph in HIS mice and that currently there is no clear murine Tph equivalent making new models for the study of this cell type of value. Equally, since many HIS mice struggle to maintain effective follicular structures Tfh models in HIS mice are not well established giving additional value to this model.

      Weaknesses:

      A weakness of the paper is that the models seem to lack a clear ability to generate germinal centres in which Tfh may exert some of their key functions. In some cases, the definition of Tph-like does not seem to differentiate well between Tph and highly activated CD4 T-cells in general, partly since the literature around these cells has not fully resolved this point.

    3. Reviewer #2 (Public review):

      Summary:

      Humanized mice, developed by transplanting human cells into immunodeficient NSG mice to recapitulate the human immune system, are utilized in basic life science research and preclinical trials of pharmaceuticals in fields such as oncology, immunology, and regenerative medicine. However, there are limitations to use humanized mice for mechanistic analysis as models of autoimmune diseases due to the unnatural T cell selection, antigen presentation/recognition process, and immune system disruption due to xenogeneic GVHD onset.

      In the present study, Vecchione et al. detailed the mechanisms of autoimmune disease-like pathologies observed in a humanized mouse (Human immune system; HIS mouse) model, demonstrating the importance of CD4+ Tfh and Tph cells for the disease onset. They clarified the conditions under which these T cells become reactive using techniques involving the human thymus engraftment and mouse thymectomy, showing their ability to trigger B cell responses, although this was not a major factor in the mouse pathology. These valuable findings provide an essential basis for interpreting past and future autoimmune disease research conducted using HIS mice.

      Strengths:

      (1) Mice transplanted with human thymus and HSCs were repeatedly executed with sufficient reproducibility, with each experiment sometimes taking over 30 weeks and requiring desperate efforts. While the interpretation of the results is still debateble, these description is valuable knowledge for this field of research.

      (2) Mechanistic analysis of T-B interaction in humanized mice, which has not been extensively addressed before, suggests part of the activation mechanism of autoreactive B cells. Additionally, the differences in pathogenicity due to T cell selection by either the mouse or human thymus are emphasized, which encompasses the essential mechanisms of immune tolerance and activation in both central and peripheral systems.

      Weaknesses:

      (1) In this manuscript, such as Fig. 2, the proportion of suppressive cells like regulatory T cells is not clarified, making it unclear to what extent the percentages of Tph or Tfh cells reflect immune activation. It would have been preferable to distinguish follicular regulatory T cells, at least. While Figure 3 shows Tregs are gated out using CD25- cells, it is unclear how the presence of Treg cells affects the overall cell population immunogenic functionally.

      The authors added the data about FOXP3 expression among Tfh/Tph cells in the revised manuscript. This improved our data interpretation.

      (2) The definition of "Disease" discussed after Fig. 6 should be explicitly described in the Methods section. It seems to follow Khosravi-Maharlooei et al. 2021. If the disease onset determination aligns with GVHD scoring, generally an indicator of T cell response, it is unsurprising that B cell contribution is negligible. The accelerated disease onset by B cell depletion likely results from lymphopenia-induced T cell activation. However, this result does not prove that these mice avoid organ-specific autoimmune diseases mediated by auto-antibodies and the current conclusion by the authors may overlook significant changes. For instance, would defining Disease Onset by the appearance of circulating autoantibodies alter the result of Disease-Free curve? Are there possibly histological findings at the endpoint of the experiment suggesting tissue damage by autoantibodies?

      The authors appropriately modified the manuscript and provided sufficient information about the definition of diseases.

      (3) Helper functions, such as differentiating B cells into CXCR5+, were demonstrated for both Hu/Hu and Mu/Hu-derived T cells. This function seemed higher in Hu/Hu than in Mu/Hu. From the results in Fig. 7-8, Hu/Hu Tph/Tfh cells have a stronger T cell identity and higher activation capacity in vivo on a per-cell basis than Mu/Hu's ones. However, Hu/Hu-T cells lacked an ability to induce class-switching in contrast to Mu/Hu's. The mechanisms causing these functional differences were not fully discussed. Discussions touching on possible changes in TCR repertoire diversity between Mu/Hu- and Hu/Hu- T cells would have been beneficial.

      The authors correctly cited their previous findings about the TCR repertoire variation. This strengthened the discussion of this study.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      As our understanding of the immune system increases it becomes clear that murine models of immunity cannot always prove an accurate model system for human immunity. However, mechanistic studies in humans are necessarily limited. To bridge this gap many groups have worked on developing humanised mouse models in which human immune cells are introduced into mice allowing their fine manipulation. However, since human immune cells will attack murine tissues, it has proven complex to establish a human-like immune system in mice. To help address this, Vecchione et al have previously developed several models using human cell transfer into mice with or without human thymic fragments that allow negative selection of autoreactive cells. In this report they focus on the examination of the function of the B-helper CD4 T-cell subsets T-follicular helper (Tfh) and T-peripheral helper (Tph) cells. They demonstrate that these cells are able to drive both autoantibody production and can also induce B-cell independent autoimmunity.

      Strengths:

      A strength of this paper is that currently there is no well-established model for Tfh or Tph in HIS mice and that currently there is no clear murine Tph equivalent making new models for the study of this cell type of value. Equally, since many HIS mice struggle to maintain effective follicular structures Tfh models in HIS mice are not well established giving additional value to this model.

      Weaknesses:

      A weakness of the paper is that the models seem to lack a clear ability to generate germinal centres. For Tfh it is unclear how we can interpret their function without the structure where they have the greatest influence. In some cases, the definition of Tph does not seem to differentiate well between Tph and highly activated CD4 T-cells in general.

      The limited ability of HIS mice to generate well-defined lymphoid tissue structures is well noted. While the emergence of T cells in HIS mice increases the size of lymphoid tissues, the structure remains suboptimal and vaccination responses are limited. We believe this is mainly due to the common gamma chain knockout, which results in a lack of murine lymphoid tissue inducer (LTi) cells, which require IL-7 signaling to interact with murine mesenchymal cells for normal lymphoid tissue development. Ongoing efforts by our group and others aim to address this challenge by providing the necessary signals. Despite this challenge, these mice do develop Tfh cells, allowing us to study this cell subset.

      We agree with the reviewer that the distinction between Tph and highly activated CD4 T cells is incomplete.

      However, we have provided several distinctions in our manuscript that support the presence of Tph in HIS mice: 1) Tph cells exhibit very high levels of PD-1 expression, whereas other activated CD4 cells have varying levels of PD-1 expression. 2) Tph cells express IL-21. 3) Tph cells promote B cell differentiation and antibody production. 

      Reviewer #2 (Public Review):

      Summary:

      Humanized mice, developed by transplanting human cells into immunodeficient NSG mice to recapitulate the human immune system, are utilized in basic life science research and preclinical trials of pharmaceuticals in fields such as oncology, immunology, and regenerative medicine. However, there are limitations to using humanized mice for mechanistic analysis as models of autoimmune diseases due to the unnatural T cell selection, antigen presentation/recognition process, and immune system disruption due to xenogeneic GVHD onset.

      In the present study, Vecchione et al. detailed the mechanisms of autoimmune disease-like pathologies observed in a humanized mouse (Human immune system; HIS mouse) model, demonstrating the importance of CD4+ Tfh and Tph cells for the disease onset. They clarified the conditions under which these T cells become reactive using techniques involving the human thymus engraftment and mouse thymectomy, showing their ability to trigger B cell responses, although this was not a major factor in the mouse pathology. These valuable findings provide an essential basis for interpreting past and future autoimmune disease research conducted using HIS mice.

      Strengths:

      (1) Mice transplanted with human thymus and HSCs were repeatedly executed with sufficient reproducibility, with each experiment sometimes taking over 30 weeks and requiring desperate efforts. While the interpretation of the results is still debatable, these description is valuable knowledge for this field of research.

      (2) Mechanistic analysis of T-B interaction in humanized mice, which has not been extensively addressed before, suggests part of the activation mechanism of autoreactive B cells. Additionally, the differences in pathogenicity due to T cell selection by either the mouse or human thymus are emphasized, which encompasses the essential mechanisms of immune tolerance and activation in both central and peripheral systems.

      Weaknesses:

      (1) In this manuscript, for example in Figure 2, the proportion of suppressive cells like regulatory T cells is not clarified, making it unclear to what extent the percentages of Tph or Tfh cells reflect immune activation. It would have been preferable to distinguish follicular regulatory T cells, at least. While Figure 3 shows Tregs are gated out using CD25- cells, it is unclear how the presence of Treg cells affects the overall cell population immunogenic functionally.

      We analyzed the % FOXP3+ cells and the % of ICOS+ cells within the Tfh and Tph cells in the spleen of Hu/Hu and Mu/Hu mice at 20 weeks post-transplantation. Importantly, we see no difference in FOXP3 expression between Tfh of Mu/Hu and Hu/Hu mice. The results have been added to panels J and K of Figure 2. 

      (2) The definition of "Disease" discussed after Figure 6 should be explicitly described in the Methods section. It seems to follow Khosravi-Maharlooei et al. 2021. If the disease onset determination aligns with GVHD scoring, generally an indicator of T cell response, it is unsurprising that B cell contribution is negligible. The accelerated disease onset by B cell depletion likely results from lymphopenia-induced T cell activation. However, this result does not prove that these mice avoid organ-specific autoimmune diseases mediated by auto-antibodies and the current conclusion by the authors may overlook significant changes. For instance, would defining Disease Onset by the appearance of circulating autoantibodies alter the result of Disease-Free curve? Are there possibly histological findings at the endpoint of the experiment suggesting tissue damage by autoantibodies?

      We have added a definition of disease to the Methods section as requested. Regarding the possibility of antibody-mediated disease that may be missed by this definition, we acknowledge this point in the Discussion section. However, we also discuss the point that the deficient complement pathway in NSG mice is likely to have protected the HIS mice from autoantibody-mediated organ damage.

      (3) Helper functions, such as differentiating B cells into CXCR5+, were demonstrated for both Hu/Hu and Mu/Huderived T cells. This function seemed higher in Hu/Hu than in Mu/Hu. From the results in Figure 7-8, Hu/Hu Tph/Tfh cells have a stronger T cell identity and higher activation capacity in vivo on a per-cell basis than Mu/Hu's ones. However, Hu/Hu-T cells lacked an ability to induce class-switching in contrast to Mu/Hu's. The mechanisms causing these functional differences were not fully discussed. Discussions touching on possible changes in TCR repertoire diversity between Mu/Hu- and Hu/Hu- T cells would have been beneficial. 

      Consistent with the reviewer’s suggestion, we have previously shown that the TCR repertoire in Mu/Hu mice is less diverse than that in Hu/Hu mice (Khosravi-Maharlooei M, et al., J Autoimmun., 2021). We believe that the narrowed TCR repertoire in the periphery of Mu/Hu mice, combined with the inadequate negative selection in the murine thymus reported in the paper cited above, results in selective peripheral expansion primarily of the few T cell clones that are cross-reactive with HLA/murine self peptide complexes presented by human APCs in the periphery.  We have discussed the reasons why these cells, when transferred to secondary recipients containing the same APCs, might not be as active as the more diverse, HLA-selected T cell repertoire transferred from Hu/Hu mice.  These possible reasons include exhaustion of the T cells in Mu/Hu mice, limited expression of the few targeted HLA-peptide complexes recognized by the narrow cross-reactive TCR repertoire of Mu/Hu T cells and the consequent relatively impaired T-B cell collaboration in these mice.   

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      The authors note that they removed an outlier result from Figures 1 B & C. With only 4 mice it seems difficult to see exactly how they determined the result was an outlier. Presumably, it was quite different from the others but in such a small dataset removing data without a very clear statistical rationale seems likely to strongly influence the results.

      We have revised Fig 1 to include the previously-deleted outlier mouse.   

      Figure 4. The authors describe the follicular area. Were they able to observe any GC-like structures in their data?

      From the examples, I can see that the PNA staining is sometimes diffuse but even if the authors felt they could not observe a distinct GC this should be stated and discussed in the text.

      We now describe the three colors IF staining in more detail in accordance with this comment. We characterized 4 Hu/Hu and 3 Mu/Hu spleens earlier than 20 weeks post-transplant. In all of these mice, distinct B cell areas (CD20+) were obvious and PNA+ cells were more concentrated in the B cell zones. We stained 4 Hu/Hu and 3 Mu/Hu spleens from mice between 20-30 weeks post-transplant and found that B cell areas were smaller in all these spleens compared to those taken before 20-weeks post-transplant. PNA+ areas are also more diffusely distributed and are not enriched in the B cell areas. Only 2 Mu/Hu mice showed clear B cell zones with some enriched PNA+ areas in the B cell zones. Additionally, we stained 2 Hu/Hu and 2 Mu/Hu mice later than week 30 post-transplant. No distinct B cell areas were observed in any of the spleens of these mice and PNA+ cells were diffusely distributed.  

      In Figure 3E the authors sort CD25-CXCR5-CD45RA- CD4 T-cells as Tph. This does seem a very loose definition including essentially all non-naïve CD4 cells that are not Tregs or Tfh.

      We agree with the reviewer that the distinction between Tph and highly activated CD4 T cells is incomplete.

      However, we have provided several distinctions in our manuscript that support the presence of Tph in HIS mice: 1) Tph cells exhibit very high levels of PD-1, whereas other activated CD4 cells have varying levels of PD-1 expression. 2) Tph cells express IL-21. 3) Tph cells promote B cell differentiation and antibody production. 

      Tph is sometimes a hard cell type to separate from more general highly activated CD4 T-cells. The broad CXCR5PD1+ phenotype they have used is common in the literature and the authors have confirmed some enrichment of IL21 production by these cells. However, they should consider if there are ways of further confirming this by examination of other markers such as CCR2 and CCR5 or elimination of other effector identities such as Th1 and Th17 or PD1+ exhaustion phenotypes.

      For this study, we chose to follow the commonly used definitions in the literature for Tph and Tfh cells. For this reason, we are careful to refer to “Tph-like” cells rather than Tph cells in this manuscript. Distinguishing Tph cells from other subsets of activated CD4 cells would require further studies such as single cell RNA seq, which we hope to be able to perform in the future with additional funding.  

      Figure 8. The authors perform some analysis of B-cell phenotypes looking at markers such as CD27, IgD in 8B, and CD11c in 8C. Why is CD11c considered in isolation? The level of expression of the other markers would change how this data would be interpreted e.g. IgD-CD27-CD11c+ = DN2/Atypical cells, IgD-CD27+CD11c+ = Activated or ageassociated, etc.

      In response to this comment, we reanalyzed the splenic samples of the donor Mu/Hu and Hu/Hu mice and their adoptive recipients. Interestingly, in the T cell donors, the Mu/Hu B cells included greater proportions of activated/age-associated B cells (IgD-CD27+CD11c+) and atypical cells (IgD-CD27-CD11c+), compared to the Hu/Hu B cells. This is consistent with the increased disease, increased Tph/Tfh and increased IgG antibody findings in the primary Mu/Hu compared to Hu/Hu mice. These results have been added to Figure 5G. We performed a similar analysis in the blood (week 9) and spleen of adoptive recipient mice. These studies showed that activated/ageassociated B cells (IgD-CD27+CD11c+) and atypical cells (IgD-CD27-CD11c+) were significantly increased in the adoptive recipients of Hu/Hu Tph and Tfh cells compared to the adoptive recipients of Mu/Hu Tph and Tfh cells (Fig. 8C). These results are consistent with the disease, T cell expansion and antibody results in the adoptive recipients. 

      Data not shown occurs often in this manuscript. In some cases what is not shown is potentially important. The authors note in the text relating to Figure 7 that the "purity of the cell populations as assessed by FCM ranged from 56-60% (data not shown)". Those numbers are a little alarming. They are referring to the purity of the FCS sorted Tfh and Tph prior to transfer? Currently, some of the discussion of this paper is about the possibility of plasticity, with Tfh switching into a Tph phenotype. If the transferred cell populations are 56-60% pure I don't think it is possible to make any interpretation of plasticity.

      We looked into this further and realized that the purity figure cited in the original manuscript was erroneous due to a misunderstanding on the part of the first author of a question from the senior author. Unfortunately, data on the purity of the FACS-sorted population was not saved. However, we have added panel B to Figure 7 to show the sorting strategy for Tfh and Tph cells.   We agree that any discussion of plasticity between these cell types is speculative, as outgrowth of a minor population is possible even from well-purified sorted cells.  

      Minor points:

      Some graphs have issues with presentation; Figures 5D and 5E, split scale clips data points. 5F the color representing time would be better replaced with direct labels. 6C and 6C some distortion of text clipping other elements.

      We changed 5D and 5E y axis scales to avoid cutting the data points. Also, we changed 5F labels. Distortion of text clipping and other elements in Fig 6E and 6A have been corrected.  

      The abbreviation LIP is used in the abstract without a clear definition until later in the text.

      This abbreviation has been defined again in the text.

      Generally, the discussion section is quite long.

      We agree that the discussion is quite long, but the results are quite complex and require considerable discussion.  We have attempted to be as concise as possible.

      Reviewer #2 (Recommendations For The Authors):

      Suggestion

      Can Supplementary Figures be merged into the mains for the convenience of readers? There is enough extra margin.

      We prefer to keep the order of main and supplementary figures as they are. 

      There are some confusing results which I would recommend to make the additional explanation for readers. For example, about 10% of Hu/Hu CD3+ T cells reacted to Auto-DC in Figure 1B, but neither CD4+ nor CD8+ cells did in Figure 1C.

      We have re-analyzed the data in Fig 1 and included the previously-deleted outlier mouse. 

      Minor

      Figure 3C

      The figure legend does not explain the figure. Hu/Mu or Mu/Mu?

      Both groups were combined in the figure, as the results were similar for both.  The N per group is given in the figure legend.  The same applies to figure 3D.

      Figure 4B, 4C

      Why were Hu/Hu and Mu/Hu data merged only in 4B? They should be discussed in the context of parallel comparison. Both y-axis labels are the same between B and C despite the legend saying differently.

      We switched the order of Figure 4B and 4C, each of which serves a different purpose. Figure 4B aims to demonstrate the similarity between the two groups at each timepoint.  Figure 4C combines the two groups in order to provide sufficient animal numbers to demonstrate the statistically significant changes over time. 

      Figure 5D

      The axis label was missing and the uncertain bar emerged. The authors should replace it with the corrected one.

      The axis and the bar in 5D have been corrected.

      Figure 5F

      The legend does not explain the figure. What are these numbers? Also, it is better if the authors add a detailed explanation to the manuscript about the reason why the sum of antibody titer represents the poly-reactivity of IgM in these mice.

      The numbers in the previous version of the figure were eartag numbers, which we have now renumbered as animal 1,2,3, etc in each group. Please refer to the final paragraph of the "Autoreactivity of IgM and IgG in HIS Mice" section in the Results section for an explanation of IgM polyreactivity.

      Fig. 7D-E etc.

      The definition of Asterisk is insufficient. Between what to what in the multiple comparisons?

      The green asterisks show significant differences between the Tph in Hu/Hu vs Mu/Hu mice, while the orange asterisks show significant differences between the Tfh in Hu/Hu vs Mu/Hu mice. This has been added to the figure legend.

      Figure 7 ~ Figure 8

      The legends on the figure are confusing due to the different order of figures. The scales are inappropriate in some figures. The readers cannot interpret the data from the unfairly compressed plots.

      We made the plots bigger to make them readable and changed the order.

      Methods

      In the description of B cell depletion Experiments, the authors should directly mention the figure number instead of "In the second Experiment ..."

      We have corrected this in the Methods section.

      There is no definition of how to define the "disease" onset.

      This definition has been added to the Methods section.

      Several undefined abbreviations: "LIP", "BLT" ...

      We defined these in the text.

    1. eLife Assessment

      This important paper on measuring molecular connectivity using combined serotonin PET and resting-state fMRI provides both novel methods for studying the brain as well as insights into the effects of ecstasy administration. The methods are convincing, with the high anaesthetic dose used likely limiting network activity.

    2. Reviewer #1 (Public review):

      This paper by Ionescu et al. applies novel brain connectivity measures based on fMRI and serotonin PET both at baseline and following ecstasy use in rats. There are multiple strengths to this manuscript. First, the use of connectivity measures using temporal correlations of 11C-DASB PET, especially when combined with resting state fMRI, is highly novel and powerful. The effects of ecstasy on molecular connectivity of the serotonin network and salience network are also quite intriguing.

      The authors discussed their use of high-dose (1.3%) isolfurane in the context of a recent consensus paper on rat fMRI (Grandjean et al., "A Consensus Protocol for Functional Connectivity Analysis in the Rat Brain.") which found that medetomidine combined with low dose isoflurane provided optimal control of physiology and fMRI signal. The authors acknowledge their suboptimal anaesthetic regimen, which was chosen before the publication of the consensus paper. This likely explains, in part, why fMRI ICs in figure 2A appear fairly restricted.

      The PET ICs appear less bilateral than the fMRI ICs, which the authors attribute to lower SNR.

    3. Reviewer #2 (Public review):

      Summary:

      The article aims to describe a novel methodology for the study of brain organization, in comparison to fMRI functional connectivity, under rest vs. controlled pharmacological stimulation.

      Strengths:

      Solid study design with pharmacological stimulation applied to assess the biological significance of functional and (novel) molecular connectivity estimates.

      Provides relevant information on the multivariate organization of serotoninergic system in the brain.

      Provides relevant information on the sensitivity of traditional (univariate PET analysis, fMRI functional connectivity) and novel (molecular connectivity) methods in measuring pharmacological effects on brain function.

      Comments on revisions:

      I thank the authors for carefully addressing my comments and in particular for the interesting insights added to the discussion.

      I have just one last remark pertaining to the point of the sample size: rats undergoing the MDMA acute challenge constitute a relatively small sample (N=11); I feel there is a certain risk the results presented might not be particularly replicable. Could the authors prove the stability of their (main) results by randomly iterating the individuals included in their sample (e.g. via permutation tests)? Alternatively, including at least a justification of the sample size in the context of the available evidence would be valuable.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1- I would like the authors to discuss and justify their use of high-dose (1.3%) isolfurane. A recent consensus paper on rat fMRI (Grandjean et al., "A Consensus Protocol for Functional Connectivity Analysis in the Rat Brain.") found that medetomidine combined with low dose isoflurane provided optimal control of physiology and fMRI signal. To overcome any doubts about the effects of the high-dose anaesthetic I'd encourage the authors to show the results of their functional connectivity specificity using the same or similar image processing protocol as described in that consensus paper. This is especially true since the fMRI ICs in Figure 2A appear fairly restricted.

      We thank the reviewer for their insightful comments. We agree that the combination of medetomidine and isoflurane, as recommended by Grandjean et al. in their consensus paper, provides superior physiological stability and fMRI signal quality, and should indeed be considered the preferred protocol for future studies. In fact, we have adopted this combination in our subsequent research [1]. However, the data acquired in the present study were acquired prior to the publication of the consensus recommendations and have been previously published [2, 3]. While isoflurane is not the ideal anesthetic for functional connectivity studies, we have demonstrated in earlier work [4], that using isoflurane at 1.3% maintains stable physiological parameters and avoids burst suppression, a key issue with higher isoflurane doses.

      Regarding preprocessing, we acknowledge the importance of standardized approaches as outlined in the consensus paper. However, to maintain methodological consistency with our prior work, we retained the original preprocessing pipeline for this study. This decision ensures comparability with our previous analyses. To address the reviewer’s concerns and encourage further verification, we have uploaded the full dataset to a public repository (as suggested in Comment 4). This will enable other researchers to reanalyze the data using updated preprocessing pipelines or explore additional analyses.

      We have updated the manuscript discussion (page 19) to clearly acknowledge these points:

      “One limitation of our study is that our experimental protocols predate the recently published consensus recommendations for rat fMRI [42], particularly concerning anesthesia and preprocessing pipelines. The use of isoflurane anesthesia, although common at the time of data acquisition, introduces a potential confound due to its known effects on neuronal activity. However, we previously demonstrated that isoflurane at 1.3% maintains stable physiological parameters and avoids burst suppression [43], a concern at higher doses. Furthermore, other studies have reported that low-dose isoflurane remains feasible for resting-state functional connectivity studies [44]. While isoflurane, as a GABA-A agonist, could theoretically interact with the mechanisms of MDMA in the brain, we found no evidence in the literature suggesting significant cross-talk between these substances. Future studies employing medetomidine-based protocols may help minimize this potential confound.

      Regarding data preprocessing, we chose to retain the same pipeline used in our prior publications [13, 14] to maintain methodological consistency. While we recognize the advantages of adopting standardized preprocessing as outlined in the consensus guidelines, this approach ensures comparability with our previous analyses. To facilitate further investigation, we have made the full dataset publicly available (see Data Availability Statement), enabling reanalysis with updated pipelines or additional explorations of this dataset.”

      Comment 2 - I'd also be interested to read more about why the cerebellum was chosen as a reference region, given that serotonin is highly expressed in the cerebellum, and what effects the choice of reference region has on their quantification.

      This is something we ourselves have examined in a paper, dedicated to determine the most suitable reference region for [11C]DASB, and while the reviewer is correct in saying there is also serotonin in the cerebellum, we found the lowest binding for this tracer in the cerebellar gray matter, recommending this region as a valid reference area. (“Displaceable binding of (11)C-DASB was found in all brain regions of both rats and mice, with the highest binding being in the thalamus and the lowest in the cerebellum. In rats, displaceable binding was largely reduced in the cerebellar cortex”, please refer to [5]).

      We amended our materials and methods part to specify that we had shown in this previous publication that the cerebellar gray matter is appropriate as a reference region (page 6):

      “Binding potentials were calculated frame-wise for all dynamic PET scans using the DVR-1 (equation 1) to generate regional BPND values with the cerebellar gray matter as a reference region, which our earlier studies have demonstrated to be the most appropriate for this tracer in rats [5, 6]:”

      Comment 3 - The PET ICs appear less bilateral than the fMRI ICs. Is that simply a thresholding artefact or is it a real signal?

      We thank the reviewer for this observation. The reduced bilaterality of PET ICs compared to fMRI ICs is likely due to the inherent limitation in the temporal resolution of PET, which provides significantly fewer frames (100 frames compared to 3000 frames for fMRI). This lower temporal resolution leads to reduced signal-to-noise ratio when computing the ICA, which can affect the stability and symmetry of the ICs during ICA computation, particularly at higher IC numbers. While thresholding may also a minor role, we believe the primary factor is poorer SNR associated with the PET data. We have clarified this point in the discussion section (page 17) as follows:

      “In our analysis, PET ICs appeared less bilateral than fMRI ICs. This is likely due to the lower temporal resolution of PET (100 frames) compared to fMRI (3000 frames), resulting in reduced signal-to-noise ratio (SNR) and potentially affecting the stability and symmetry of the independent components.”

      Comment 4 - "The data will be made available upon reasonable request" is not sufficient - please deposit the data in an open repository and link to its location.

      We agree with the request of the reviewer and uploaded the data to a Dryad repository. We amended our Data Availability Statement accordingly.

      Comment 5 (recommendation) - Please add the age and sex of the rats in lines 92-97.

      Amended.

      Comment 6 (recommendation) - There are multiple typos throughout the manuscript - for example, "z-vlaue" on line 164, "negligable" on line 194, etc.. Sometimes the 11 in 11C is superscripted, sometimes it isn't. This paper would benefit from a careful proofread.

      Thank you for pointing this out. We sent the manuscript for language and grammar editing to AJE (see certificate).

      Reviewer 2:

      Comment 1 - While the study protocol is referenced in the paper, it would be useful to at least report whether the study uses bolus, constant infusion, or a combination of the two and the duration of the frames chosen for reconstruction. Minimal details on anesthesia should also be reported, clarifying whether an interaction between the pharmacological agent for anesthesia and MDMA can be expected (whole-brain or in specific regions).

      We fully agree that this would improve the readability of our manuscript and added the information to the materials and methods and discussion accordingly. Please refer to page 4/5.

      Comment 2 - Some terminology is used in a bit unclear way. E.g. "seed-based" usually refers to seed-to-voxel and not ROI-to-ROI analysis, or e.g. it is a bit confusing to have IC1 called SERT network when in fact all ICs derived from DASB data are SERT networks. Perhaps a different wording could be used (IC1 = SERT xxxxx network; IC2= SERT salience network).

      Based on the reviewer´s suggestion, we suggest to rename IC1 and IC2 according to their anatomical and functional characteristics (page 13):

      “IC1 = SERT Salience Network: This name highlights the involvement of the regions typically associated with the salience network (e.g., CPu, Cg, NAc, Amyg, Ins, mPFC), which play key roles in emotional and cognitive processing.”

      “IC2 = SERT Subcortical Network: This name reflects the involvement of subcortical regions which play a role in arousal, stress response, and autonomic regulation, which are heavily modulated by serotonin in areas like the hypothalamus, PAG, and thalamus.”

      Comment 3 - The limited sample size for the rats undergoing pharmacological stimulation which might make the study (potentially) not particularly powerful. This could not be a problem if the MDMA effect observed is particularly consistent across rats. Information on inter-individual variability of FC, MC, and BPND could be provided in this regard.

      We thank the reviewer for raising this point. To address the concern about limited sample size and inter-individual variability, we have added this information to Figures 5 B and D. Regarding the BPND variability, the dotted lines in Figure 3 indicate the standard deviation in the regional BPNDs, however, this was not clearly stated in the original figure description. We have now amended the figure legend to explicitly clarify this point.

      Comment 4 (recommendation) - "Our research employs a novel approach named "molecular connectivity" (MC), which merges the strengths of various imaging methods to offer a comprehensive view of how molecules interact within the brain and affect its function." I'd recommend rephrasing to "..how molecular interact across different areas within the brain..". Molecular connectivity is a potentially ambiguous term (used to study interactions across different molecules (in the same compartment/environment) vs. to study interactions across the same molecules in different areas). I'd add a couple of references to help the reader disambiguate too (e.g. https://pubmed.ncbi.nlm.nih.gov/30544240/ , https://pubmed.ncbi.nlm.nih.gov/36621368/)

      We appreciate the reviewer’s suggestion and agree that the term "Molecular Connectivity" could be ambiguous. To clarify, we rephrased the description to emphasize that our approach specifically examines interactions of the same molecule (i.e., serotonin transporter) across different brain regions, rather than interactions between different molecules within the same environment. We propose the following revised text (page 2):

      “Our research employs a novel approach termed molecular connectivity (MC), which combines the strengths of various imaging methods to provide a comprehensive view of how specific molecules, such as the serotonin transporter, interact across different brain regions and influence brain function.”

      Additionally, we will incorporate the suggested references to help the reader further contextualize the use of this term.

      Comment 5 - In the methods, it is not clear if for MC the authors also compute ROI-to-ROI correlations or only ICA.

      Thank you for highlighting this point. To clarify, our MC analysis, includes both ROI-to-ROI correlations and ICA. Specifically, as described at the end of the “Molecular Connectivity Analysis” subchapter, we compute ROI-to-ROI correlations using the following steps: 1. The first 20 minutes of each scan are discarded to account for perfusion effects. 2. A detrending approach is applied to the remaining 60 minutes of BP<sub>ND</sub> time courses. 3. ROI-to-ROI calculations are then calculated and organized into subject-level correlation matrices, which are subsequently z-transformed to generate mean correlation matrices across subjects.

      We revised the methods section to explicitly state that both ROI-to-ROI correlations and ICA are integral components of the MC analysis to ensure this point is clear to readers (page 6).

      “The BP<sub>ND</sub> time courses were then used to calculate MC as described above for fMRI: ROI-to-ROI subject-level correlation matrices between all regional time courses were generated and z-transformed correlation coefficients were used to calculate mean correlation matrices.”

      Comment 7 - In the discussion, it could be useful to relate IC1 and IC2 to well-established neuroanatomical/molecular knowledge of the serotoninergic system. Did the authors expect the IC1 and IC2 anatomical distributions? is there a plausible biological reason as to why the time courses of BPnd variations would be somehow different between IC1 and IC2?

      We appreciate the reviewer’s insightful comment and agree on the importance of relating IC1 and IC2 to well-established neuroanatomical and molecular knowledge of the serotonergic system.

      In our discussion, we noted that IC1 primarily encompasses subcortical structures such as the brainstem, midbrain, and thalamus. These regions are consistent with areas housing dense serotonergic projections originating from the raphe nuclei, the primary source of serotonin release. In contrast, IC2 involves limbic and cortical regions - including the striatum, amygdala, cingulate, insular, and prefrontal cortices - which are key targets of the serotonergic pathways. This anatomical distinction aligns with the hierarchical organization of the serotonergic system, where the brainstem nuclei exert both local and distal serotonergic modulation.

      The observed differences in the temporal dynamics of the binding potential (BP<sub>ND</sub>) variations between IC1 and IC2 likely reflect the distinct functional roles of these regions within the serotonergic network. The more immediate changes in IC1 could be attributed to the direct effect of MDMA on the raphe nuclei, leading to rapid serotonin release in subcortical structures. In contrast, the delayed changes in IC2 may reflect downstream modulation in cortical and limbic regions involved in processing more complex emotional and cognitive functions.

      That said, while these interpretations are plausible based on current neuroanatomical and functional knowledge, the exact biological mechanisms underlying the differential time courses remain unclear. As discussed in the manuscript, future studies incorporating direct, simultaneous measurements of serotonin levels and imaging data will be essential to fully elucidate the temporal and spatial dynamics of serotonin transmission in these regions. We have revised to better highlight this limitation in the discussion section (page 17) as an important area for further investigation:

      “Our results demonstrate that compared with FC, MDMA induces more pronounced changes in MCs, particularly in regions associated with the SERT subcortical network. The distinct temporal dynamics of BPnd variations between these components may reflect the hierarchical organization of the serotonergic system. Specifically, the raphe nuclei, as the primary source of serotonin, are likely to exert more immediate modulation on posterior subcortical structures (IC2), whereas downstream effects on limbic and cortical regions (IC1) may occur more gradually. While these findings align with current neuroanatomical and molecular knowledge, the precise biological mechanisms driving these temporal differences remain unclear. Future investigations are warranted to elucidate these mechanisms. Future studies combining direct measurements of serotonin levels with neuroimaging data will be critical to fully understanding these components’ distinct roles and temporal profiles in regulating serotonergic function.”

      Comment 8 - In the discussion (physiological basis), could the authors detail the expected "time scale" in changes in SERT expression? How quickly can SERT expression change, especially under resting-state conditions? Is it reasonable to consider tracer fluctuations under rest conditions as biologically meaningful?

      SERT regulation can occur over different time scales depending on the mechanism involved [7].

      Acute, rapid changes (milliseconds to seconds): Protein-protein interactions with key regulatory proteins (e.g., syntaxin1A, neuronal nitric oxide synthase) can lead to rapid modulation of SERT surface expression [8-11]. These interactions often involve changes in transporter trafficking or conformational states and can occur within milliseconds to seconds. For example, syntaxin1A directly interacts with the N-terminus of SERT, influencing its availability on the plasma membrane within short timescales.

      Intermediate time scales (seconds to minutes): Posttranslational modifications, such as phosphorylation by kinases (e.g., protein kinase C) or dephosphorylation by phosphatases, are known to influence SERT function and surface expression [12-14]. These processes are typically initiated in response to cellular signaling and occur over seconds to minutes, affecting the SERT trafficking dynamics and serotonin uptake capacity [15, 16].

      Longer-term changes (minutes to hours): Longer-term regulation involves processes like endocytosis, recycling, or degradation of SERT. These pathways typically take minutes to hours and are often part of more sustained cellular responses to changes in neuronal activity or serotonin levels. Such changes are slower but contribute to the overall cellular homeostasis of SERT under prolonged stimulation.

      Under resting-state conditions, where neurons are not subjected to rapid or dramatic fluctuations in neurotransmitter release or signaling, SERT expression and activity are generally stable but still subject to subtle fluctuations due to ongoing basal regulatory processes. Basal phosphorylation or low-level protein-protein interactions can still dynamically modulate SERT trafficking and function, albeit at a lower intensity than under stimulated conditions. These fluctuations, although smaller in magnitude, may reflect fine-tuning of serotonin homeostasis and can occur on shorter timescales (seconds to minutes).

      Biological Relevance of Tracer Fluctuations at Rest:

      It is reasonable to consider that tracer fluctuations under resting conditions could reflect biologically meaningful variations in SERT expression and function. Even subtle shifts in SERT surface availability or activity can impact serotonin clearance and signaling, given the fine balance required to maintain serotonergic tone. These fluctuations may reflect intrinsic neuronal variability or ongoing homeostatic adjustments to maintain optimal neurotransmitter levels or serve as early indicators of adaptive responses to environmental or physiological changes before more overt modifications in transporter expression or activity become apparent.

      In summary, while SERT expression can change rapidly in response to signaling events (milliseconds to minutes), even under resting-state conditions, subtle regulatory fluctuations can be biologically meaningful. These fluctuations likely reflect ongoing regulatory adjustments essential for maintaining serotonergic balance and should not be disregarded as noise, particularly in experimental measurements using tracers.

      We added the following paragraph to the discussion (page 16):

      In addition, SERT regulation occurs over multiple time scales, ranging from milliseconds to hours, depending on the mechanism involved [31]. Rapid changes in SERT surface expression can be mediated by protein-protein interactions or posttranslational modifications [32, 33], such as phosphorylation, which occur on a timescale of milliseconds to minutes. These processes dynamically modulate surface availability and function, allowing fine-tuned regulation of serotonin uptake even under resting-state conditions. Additionally, while slower processes involving endocytosis, recycling, and degradation typically occur over minutes to hours, subtle fluctuations in SERT trafficking and activity can still occur under basal conditions. These minor yet biologically relevant changes likely reflect ongoing homeostatic regulation essential for maintaining serotonergic balance. Therefore, tracer fluctuations observed during resting-state measurements should not be dismissed, as they may represent meaningful variations in SERT regulation that contribute to the fine control of serotonin clearance.

      Comment 9 - In the discussion, the SERT network results should be commented on more extensively, as there is now only a generic reference to MC changes being stronger than FC ones, without spatial reference to the SERT network (while only negative salience network results are referenced explicitly instead, making the paragraph a bit confusing).

      We expanded the discussion to accommodate a more thorough contemplation of this network. This revised paragraph (page 17) directly addresses the spatial aspects of the SERT network, highlighting the specific regions involved in serotonergic connectivity and contrasting molecular and functional connectivity changes induced by MDMA.

      Comment 10 - Figure 3; I'd switch left and right charts in the bottom panel (last row only), to keep the SERT network always on the left of the Figure.

      We agree with the suggestion and changed the figure accordingly.

      Comment 11 - Figure 4: I'd add FC decreases to the figure, to allow the reader to compare BPnd, MC, and FC changes more easily and I'd add a horizontal line at the equivalent of e.g. Z-1.96 (or similar) so that it is clear which measures/regions display significant changes.

      We prefer to keep the figure focusing on the two analyses of PET alterations, since we want to emphasize their complementarity in the context of PET specifically. However, we added lines indicating significances, in line with the reviewer’s suggestion.

      Comment 12 - In Figure 5D, the y-axis mentioned FC but I suppose it should mention MC.

      We amended the figure accordingly, together with the changes to the names of the networks implemented across the manuscript.

      (1) Marciano, S., et al., Combining CRISPR-Cas9 and brain imaging to study the link from genes to molecules to networks. Proc Natl Acad Sci U S A, 2022. 119(40): p. e2122552119.

      (2) Ionescu, T.M., et al., Striatal and prefrontal D2R and SERT distributions contrastingly correlate with default-mode connectivity. Neuroimage, 2021. 243: p. 118501.

      (3) Ionescu, T.M., et al., Neurovascular Uncoupling: Multimodal Imaging Delineates the Acute Effects of 3,4-Methylenedioxymethamphetamine. J Nucl Med, 2023. 64(3): p. 466-471.

      (4) Ionescu, T.M., et al., Elucidating the complementarity of resting-state networks derived from dynamic [(18)F]FDG and hemodynamic fluctuations using simultaneous small-animal PET/MRI. Neuroimage, 2021. 236: p. 118045.

      (5) Walker, M., et al., In Vivo Evaluation of 11C-DASB for Quantitative SERT Imaging in Rats and Mice. J Nucl Med, 2016. 57(1): p. 115-21.

      (6) Walker, M., et al., Imaging SERT Availability in a Rat Model of L-DOPA-Induced Dyskinesia. Mol Imaging Biol, 2020. 22(3): p. 634-642.

      (7) Lau, T. and P. Schloss, Differential regulation of serotonin transporter cell surface expression. Wiley Interdisciplinary Reviews: Membrane Transport and Signaling, 2012. 1(3): p. 259-268.

      (8) Haase, J., et al., Regulation of the serotonin transporter by interacting proteins. Biochem Soc Trans, 2001. 29(Pt 6): p. 722-8.

      (9) Quick, M.W., Regulating the conducting states of a mammalian serotonin transporter. Neuron, 2003. 40(3): p. 537-49.

      (10) Ciccone, M.A., et al., Calcium/calmodulin-dependent kinase II regulates the interaction between the serotonin transporter and syntaxin 1A. Neuropharmacology, 2008. 55(5): p. 763-70.

      (11) Chanrion, B., et al., Physical interaction between the serotonin transporter and neuronal nitric oxide synthase underlies reciprocal modulation of their activity. Proc Natl Acad Sci U S A, 2007. 104(19): p. 8119-24.

      (12) Qian, Y., et al., Protein kinase C activation regulates human serotonin transporters in HEK-293 cells via altered cell surface expression. J Neurosci, 1997. 17(1): p. 45-57.

      (13) Ramamoorthy, S., et al., Phosphorylation and regulation of antidepressant-sensitive serotonin transporters. J Biol Chem, 1998. 273(4): p. 2458-66.

      (14) Jayanthi, L.D., et al., Evidence for biphasic effects of protein kinase C on serotonin transporter function, endocytosis, and phosphorylation. Mol Pharmacol, 2005. 67(6): p. 2077-87.

      (15) Steiner, J.A., A.M. Carneiro, and R.D. Blakely, Going with the flow: trafficking-dependent and -independent regulation of serotonin transport. Traffic, 2008. 9(9): p. 1393-402.

      (16) Lau, T., et al., Monitoring mouse serotonin transporter internalization in stem cell-derived serotonergic neurons by confocal laser scanning microscopy. Neurochem Int, 2009. 54(3-4): p. 271-6.

    1. eLife Assessment

      This important work provides another layer of regulatory mechanism for TGF-beta signaling activity. The evidence convincingly supports the involvement of microtubules as a reservoir of Smad2/3, and association of Rudhira with microtubules is critical for this process. The work will be of board interest to developmental biologists in general and molecular biologists in the field of growth factor signaling.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript aimed to study the role of Rudhira (also known as Breast Carcinoma Amplified Sequence 3), an endothelium-restricted microtubules-associated protein, in regulating of TGFβ signaling. The authors demonstrate that Rudhira is a critical signaling modulator for TGFβ signaling by releasing Smad2/3 from cytoskeletal microtubules and how that Rudhira is a Smad2/3 target gene. Taken together, the authors provide a model of how Rudhira contributes to TGFβ signaling activity to stabilize the microtubules, which is essential for vascular development.

      Strengths:

      The study used different methods and techniques to achieve aims and support conclusions, such as Gene Ontology analysis, functional analysis in culture, immunostaining analysis, and proximity ligation assay. This study provides unappreciated additional layer of TGFβ signaling activity regulation after ligand-receptor interaction.

      Weaknesses:

      (1) It is unclear how current findings provide a better understanding of Rudhira KO mice, which the authors published some years ago.

      (2) Why do they use HEK cells instead of SVEC cells in Fig 2 and 4 experiments?

      (3) A model shown in Fig 5E needs improvement to grasp their findings easily.

    3. Author response:

      The following is the authors’ response to the previous reviews

      According to the reviewers' comments, we appreciate your substantial updates. However, the statistical issue remains unsolved. The following is a general way to get fold changes between controls and experimental samples. Each sample will generate relative differences between target molecules and internal controls. For the case of Fig 1B, the target is pSmad2, and the internal control is the total Smad2. Three control samples will generate three numbers for pSmad2/Smad2 ratios with variations. Similarly, T204D samples will generate three numbers with variations. Then, the average of these three numbers will be set as 1 (with variations) to calculate fold changes between the control and T204D groups. The point is that the statistical significance needs to be evaluated between two groups with variations. This standard method differs from what you described in the manuscript. I hope this explains why the issue needs to be fixed. Please work on the following 11 panels to revise.

      (1) Fig 1B, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by T204D.

      (2) Fig 1C, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by Tb/Rudhira.

      (3) Fig 1D, QRT PCR, pai1/mmp9, fold change by Tb treatment, reference not disclosed.

      (4) Fig 2A, migration, crystal red absorbance.

      (5) Fig 2B, migration, crystal red absorbance.

      (6) Fig 4A, QRT PCR, fold change by Tb.

      (7) Fig 4B, WB, Rudhira, fold change by Tb.

      (8) Fig 4C, intensity, with variation, fine.

      (9) Fig 4D, WB, Rudhira, loading control GAPDH, fold change by Smad2/3 silencing.

      (10) Fig 5A, WB, Rudhira/Glu-Tub, loading control GAPDH, fold change by Tb and/or AcD.

      (11) Fig 5C, WB, Glu-Tub.

      For western blots:

      Graphs for western blots in the following figures have been modified to show the variance in controls, as suggested:

      (1) Fig 1B, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by T204D.

      (2) Fig 1C, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by Tb/Rudhira.

      (7) Fig 4B, WB, Rudhira, fold change by Tb.

      (9) Fig 4D, WB, Rudhira, loading control GAPDH, fold change by Smad2/3 silencing.

      (10) Fig 5A, WB, Rudhira/Glu-Tub, loading control GAPDH, fold change by Tb and/or AcD.

      (11) Fig 5C, WB, Glu-Tub.

      For qPCRs:

      The reader’s comment asked to display error bars if the variance in controls was considered. The variance in controls was not considered, which is a standard practice in the qPCR assay. In this regard, an example from an eLife paper is cited below (variation not considered in controls):

      Fig 4C from Conti et al., N6-methyladenosine in DNA promotes genome stability, revised v2 Feb 3, 2025.

      Accordingly, the following graphs remain unchanged:

      (3) Fig 1D, QRT PCR, pai1/mmp9, fold change by Tb treatment, reference not disclosed.

      (6) Fig 4A, QRT PCR, fold change by Tb.

      For crystal violet experiments:

      Due to variability in the procedure introduced from CV preparation, uptake, and extraction etc., in the absence of a reference/standard, it is not possible to determine the absolute cell number across experiments. To simplify the calculation, we normalize CV intensity of all the samples to control for an experiment, so the control group doesn’t have error bars. In this regard, an example from an eLife paper is cited below (variation not considered in controls).

      Fig 2H from Brunner et al., PTEN and DNA-PK determine sensitivity and recovery in response to WEE1 inhibition in human breast cancer, version of record July 6, 2020.

      Accordingly, the following graphs remain unchanged:

      (4) Fig 2A, migration, crystal red absorbance.

      (5) Fig 2B, migration, crystal red absorbance.

      Lastly, #8 remains unchanged.

      (8) Fig 4C, intensity, with variation, fine.

    1. eLife Assessment

      The authors provide a compelling method for characterizing communication within brain networks. The study engages important, biologically pertinent, concerns related to the balance of dynamics and structure in assessing the focal points of brain communication. It will be of interest to researchers trying to dissect structure of complex interaction networks across scales, from cells to regions.

    2. Reviewer #2 (Public review):

      Summary:

      The authors provide a compelling method for characterizing communication within brain networks. The study engages important, biologically pertinent, concerns related to the balance of dynamics and structure in assessing the focal points of brain communication. The methods are clear, and seem broadly applicable, although they require some forethought about data and modeling choices.

      Strengths:

      The study is well-developed, providing overall clear exposition of relevant methods, as well as in-depth validation of the key network structural and dynamical assumptions. The questions and concerns raised in reading the text were always answered in time, with straightforward figures and supplemental materials.

      Weaknesses:

      In earlier drafts of the work, the narrative structure at times conflicts with the interpretability, however, this was greatly improved during revisions. The only remaining limitation for broad applicability lies in the full observability required in the current paradigm, however, the authors point at avenues for relaxing this assumption, which could be fruitful next steps for researchers aiming to deploy this work to EM or two-photon based datasets.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      Summary:

      In this study, Fakhar et al. use a game-theoretical framework to model interregional communication in the brain. They perform virtual lesioning using MSA to obtain a representation of the influence each node exerts on every other node, and then compare the optimal influence profiles of nodes across different communication models. Their results indicate that cortical regions within the brain's "rich club" are most influential.

      Strengths:

      Overall, the manuscript is well-written. Illustrative examples help to give the reader intuition for the approach and its implementation in this context. The analyses appear to be rigorously performed and appropriate null models are included.

      Thank you.

      Weaknesses:

      The use of game theory to model brain dynamics relies on the assumption that brain regions are similar to agents optimizing their influence, and implies competition between regions. The model can be neatly formalized, but is there biological evidence that the brain optimizes signaling in this way? This could be explored further. Specifically, it would be beneficial if the authors could clarify what the agents (brain regions) are optimizing for at the level of neurobiology - is there evidence for a relationship between regional influence and metabolic demands? Identifying a neurobiological correlate at the same scale at which the authors are modeling neural dynamics would be most compelling.

      This is a fundamental point, and we put together a new project to address it. The current work focuses on, firstly, rigorously formalizing a prevailing assumption that brain regions optimize communication, and then uncovering what are the characteristics of communication if this optimization is indeed taking place. Based on our findings, we suspect the mechanism of an optimal communication to be through broadcasting (compared to other modes explored in our work, e.g., the shortest-path signalling or diffusion). However, we recognize that our game-theoretical framework does not directly address “how” this mechanism is implemented. Thus, in our follow-up work, we are analyzing available datasets of signal propagation in the brain to see if communication dynamics there match the predictions of the game-theoretical setup. However, following your question, we extended our discussion to cover this point, cited five other works on this topic, and what, we think, could be the neurobiological mechanism of optimal signalling.  

      It is not entirely clear what Figure 6 is meant to contribute to the paper's main findings on communication. The transition to describing this Figure in line 317 is rather abrupt. The authors could more explicitly link these results to earlier analyses to make the rationale for this figure clearer. What motivated the authors' investigation into the persistence of the signal influence across steps?

      Great question. Figure 6 in part follows Figure 5, which summarizes a key aspect of our work: Signals subside at every step but not exponentially (Figure 5), and they nearly fall apart after around 6 steps (Figure 6 A and B). Subplots A and B together suggest that although measures like communicability account for all possible pathways, the network uses a handful instead, presumably to balance signalling robustness versus the energetic cost of signalling. Subplot C, one of our main findings, then shows how one simple model is all needed to predict a large portion of optimal influence compared to other models and variables. In sum, Figure 5 focused on the decay dynamics while Figure 6 focused on the extent, in terms of steps, given that the decay is monotonic. Together, our motivation for this figure was to show how the right assumption about decay rate and dynamics can outperform other measures in predicting optimal communication. 

      The authors used resting-state fMRI data to generate functional connectivity matrices, which they used to inform their model of neural dynamics. If I understand correctly, their functional connectivity matrices represent correlations in neural activity across an entire fMRI scan computed for each individual and then averaged across individuals. This approach seems limited in its ability to capture neural dynamics across time. Modeling time series data or using a sliding window FC approach to capture changes across time might make more sense as a means of informing neural dynamics.

      We agree with you on the fact that static fMRI is limited in capturing neural dynamics. However, we opted not to perform dynamic functional connectivity fitting just yet for a practical reason: Other communication models used here do not fit to any empirical data and provide a static view of the dynamics, comparable to the static functional connectivity. Since one of our goals was to compare different communication regimes, and the fact that fitting dynamics does not seem to substantially change the outcome if the end result is static (Figure 7), we decided to go with the poorer representation of neural data for this work. However, part of our follow-up project involves looking into the dynamics of influence over time and for that, we will fit our models to represent more realistic dynamics.

      The authors evaluated their model using three different structural connectomes: one inferred from diffusion spectrum imaging in humans, one inferred from anterograde tract tracing in mice, and one inferred from retrograde tract-tracing in macaque. While the human connectome is presumably an undirected network, the mouse and macaque connectomes are directed. What bearing does experimentally inferred knowledge of directionality have on the derivation of optimal influence and its interpretation?

      In terms of if directionality changes the interpretation of optimal influence, we think it sets limits for how much we can compare communication dynamics of these two types of networks. We think interpreting optimal communication in directed graphs needs to disentangle incoming influence from outgoing influence, e.g., analyzing “projector hubs/coordinators” and “receiver hubs/integrators” instead of putting both into a common class of hubs. Also, here we showed the extent of which a signal travels before it significantly degrades, having done so in an undirected graph. One of its implications for a directed graph is the possibility that some nodes can be unreachable from others, given the more restricted navigation. A possibility that we did not observe in the human connectome as all nodes could reach others, although with limited influence (see Figure 2. C). We did not explore these differences, as we used mice and macaque connectomes primarily to control for modality-specific confounds of DSI. However, our relatively poorer fit for directed networks (Supplementary Figure 2) motivated us to analyze how reciprocal connections shape dynamics and what impact do they have on networks’ function. Using the same connectomes as the current work, we addressed this question in a separate publication (Hadaeghi et al., 2024) and plan to extend both works by analyzing the signalling properties of directed networks.

      It would be useful if the authors could assess the performance of the model for other datasets. Does the model reflect changes during task engagement or in disease states in which relative nodal influence would be expected to change? The model assumes optimality, but this assumption might be violated in disease states.

      This is a wonderful idea that we initially had in mind for this work as well, but decided to dedicate a separate work on deviations in different tasks states, as well as disease states (mainly neurodegenerative disorders). We noticed the practical challenges of fitting large-scale models to task dynamics and harmonizing neuroimaging datasets of neurodegenerative disorders is beyond the scope of the current work. Unfortunately, this effort, although exciting and promising, is still pending as the corresponding author does not yet have the required expertise of neuroimaging processing pipelines.

      The MSA approach is highly computationally intensive, which the authors touch on in the Discussion section. Would it be feasible to extend this approach to task or disease conditions, which might necessitate modeling multiple states or time points, or could adaptations be made that would make this possible?

      Continuing our response from the previous point, yes, we think, in theory, the framework is applicable to both settings. Currently, our main point of concern is not the computational cost of the framework but the harmonization of the data, to ensure differences in results are not due to differences in preprocessing steps. However, assuming that all is taken care of, we believe a reasonable compute cluster should suffice by parallelizing the analytical pipeline over subjects. We acknowledge that the process would still be time-consuming, but besides the fitting process, we expect a modern high-performance CPU with about 32–64 threads to take up to 3 days analyzing one subject, given 100 brain regions or fewer. This performance then scales with the number of cluster nodes that can each work on one subject. We note that the analytical estimators such as SAR could be used instead, as it largely predicts the results from MSA. The limitations are then the lack of dynamics over time and potential estimation errors.

      Reviewer #2 (Public review):

      Summary:

      The authors provide a compelling method for characterizing communication within brain networks. The study engages important, biologically pertinent, concerns related to the balance of dynamics and structure in assessing the focal points of brain communication. The methods are clear and seem broadly applicable, however further clarity on this front is required.

      Strengths:

      The study is well-developed, providing an overall clear exposition of relevant methods, as well as in-depth validation of the key network structural and dynamical assumptions. The questions and concerns raised in reading the text were always answered in time, with straightforward figures and supplemental materials.

      Thank you.

      Weaknesses:

      The narrative structure of the work at times conflicts with the interpretability. Specifically, in the current draft, the model details are discussed and validated in succession, leading to confusion. Introducing a "base model" and "core datasets" needed for this type of analysis would greatly benefit the interpretability of the manuscript, as well as its impact.

      Following your suggestion, we modified the introduction to emphasize on the human connectome and the linear model as the main toolkit. We also added a paragraph explaining the datasets that can be used instead.

      Recommendations for the authors:

      Essential Revisions (for the authors):

      (1) The method presents an important and well-validated method for linking structural and functional networks, but it was not clear precisely what the necessary data inputs were and what assumptions about the data mattered. To improve the clarity of the presentation for the reader, it would be beneficial to have an early and explicit description of the flow of the method - what exact kinds of datasets are needed and what decisions need to be made to perform the analysis. In addition, there were questions about how the use or interpretation of the method might change with different methods of measuring structure or function, which could be answered via an explicit discussion of the issue. For example, how do undirected fMRI correlation networks compare to directed tracer injection projection networks? Similarly, could this approach apply in cases like EM connectomics with linked functional imaging that do not have full observability in both modalities?

      This is an important point that we missed addressing in detail in the original manuscript. Now we did so, by first adding a paragraph (lines 292-305, page 10) explaining the pipeline and how our framework handles different modeling choices, and then further discussing it in the Discussion (lines 733-748, page 28). Moreover, we adjusted Figure 1, by delineating two main steps of the pipeline. Briefly, we clarified that MSA is model-agnostic, meaning that, in principle, any model of neural dynamics can be used with it, from the most abstract to the most biologically detailed. Moreover, the approach extends to networks built on EM connectomics, tract-tracing, DTI, and other measures of anatomical connectivity. However, we realized that a key detail was not explicitly discussed (pointed to by Reviewer #2), that is, the fact that these models naturally need to be fitted to the empirical dataset, even though this fitting step appears not to be critical, as shown in Figure 7.

      Lines 292-305:

      “The MSA begins by defining a ‘game.’ To derive OSP, this game is formulated as a model of dynamics, such as a network of interacting nodes. These can range from abstract epidemic and excitable models (Garcia et al., 2012; Messé et al., 2015a) to detailed spiking neural networks (Pronold et al., 2023) and to mean-field models of the whole brain dynamics, as chosen here (see below). The model should ideally be fitted to reflect real data dynamics, after which MSA systematically lesions all nodes to derive the OSP. Put together, the framework is general and model-agnostic in the sense that it accommodates a wide range of network models built on different empirical datasets, from human neuroimaging and electrophysiology to invertebrate calcium imaging, and anything in between. In essence, the framework is not bound to specific modelling paradigms, allowing direct comparison among different models (e.g., see section Global Network Topology is More Influential Than Local Node Dynamics).”

      Lines 733-740:

      “As noted in the introduction, OI is model-agnostic, here, we leveraged this liberty to compare signaling under different models of local dynamics, primarily built upon undirected human connectome data. We also considered different modalities, e.g., tract tracing in Macaque (see Structural and Functional Connectomes under Materials and Methods) to confirm that the influence of weak connections is not inflated due to imaging limitations (Supplementary Figure 5. A). The game theoretical formulation of signaling allows for systematic comparison among many combinations of modeling choices and data sources.”

      We then continued with addressing the issue of full observability. We clarified that in this work, full observability was assumed. However, the mathematical foundations of our method capture unobserved contributors/influencers as an extra term, similar to the additive error term of a linear regression model. To keep the paper as non-technical as possible, we omitted expanding the axioms and the proof of how this is achieved, and instead referred to previous papers introducing the framework. 

      Lines 740-748:

      “Nonetheless, in this work, we assumed full observability, i.e., complete empirical knowledge of brain structure and function that is not necessarily practically given. Although a detailed investigation of this issue is needed, mathematical principles behind the method suggest that the framework can isolate the unobserved influences. In these cases, activity of the target node is decomposed such that the influence from the observed sources is precisely mapped, while the unobserved influences form an extra term, capturing anything that is left unaccounted for, see (Algaba et al., 2019b; Fakhar et al., 2024) for more technical details.”

      (2) The value of the normative game theoretic approach was clear, but the neurobiological interpretation was less so. To better interpret the model and understand its range of applicability, it would be useful to have a discussion of the potential neurobiological correlates that were at the same level of resolution as the modeling itself. Would such an optimization still make sense in disease states that might also be of interest?

      This is a brilliant question, which we decided to explore further in separate studies. Specifically, the link between optimal communication and brain disorders is a natural next step that we are pursuing. Here, we expanded our discussion with a few lines first explaining the roots of our main assumption, which is that neurons optimize information flow, among other goals. We then hypothesized that the biological mechanisms by which this goal is achieved include (based on our findings) adopting a broadcasting regime of signaling. We suspect that this mode of communication, operationalized on complex network topologies, is a trade-off between robust signaling and energy efficiency. Currently, we are planning practical steps to test this hypothesis.

      Lines 943-962:

      “Nonetheless, our framework is grounded in game theory where its fundamental assumption is that nodes aim at maximizing their influence over each other, given the existing constraints. This assumption is well explored using various theoretical frameworks (Buehlmann and Deco, 2010; Bullmore and Sporns, 2012; Chklovskii et al., 2002; Laughlin and Sejnowski, 2003; O’Byrne and Jerbi, 2022) and remains open to further empirical investigation. Here, we used game theory to mathematically formalize a theoretical optimum for communication in brain networks. Our findings then provide a possible mechanism for achieving this optimality through broadcasting. Based on our results, we speculate that, there exists an optimal broadcasting strength that balances robustness of the signal with its metabolic cost. This hypothesis is reminiscent of the concept of brain criticality, which suggests the brain to be positioned in a state in which the information propagates maximally and efficiently (O’Byrne and Jerbi, 2022; Safavi et al., 2024). Together, we suggest broadcasting to be the possible mechanism with which communication is optimized in brain networks, however, further research directions include investigating whether signaling within brain networks indeed aligns with a game-theoretic definition of optimality. Additionally, if it does, subsequent studies could then examine how deviations from optimal communication contribute to or result from various brain states or neurological and psychiatric disorders.”

      Reviewer #1 (Recommendations for the authors):

      I would recommend that the authors consider the following point in a revision, as well as the major weaknesses of the public review. Some aspects of Figure 1 could be clearer. What is being illustrated by the looping arrow to MSA? What is being represented in the matrices (labeling "source" and "target" on the matrix might enhance clarity)? Is R2 the metric used to assess the degree of similarity between communication models? These could be addressed by making small additions to the figure legend or to the figure itself.

      Thank you for your constructive comment on Figure 1, which is arguably the most important figure in the manuscript. We adjusted the figure and its caption (see above) based on your suggestions. After doing so, we think the figure is now clearer regarding the pipeline used in this work.

      Reviewer #2 (Recommendations for the authors):

      Overall, as stated in the public review and the short assessment, the manuscript is in a clearly mature state and brings an important method to link the fields of structural and functional brain networks.

      Nevertheless, the paper would benefit from an early, and clear, discussion of the:

      (1) components of the model, and assumptions of each, should be stated at the end of the introduction, or early in results. (2) datasets necessary to run the analysis.

      The confusion arises from lines 130-131, stating "In the present work (summarized in Figure 1), we used the human connectome, large-131 scale models of dynamics, and a game-theoretical perspective of signaling." This, to me, indicated that a structural connectivity map may be the only dataset required, as the dynamics model and game theory component are solely simulated. However, later, lines 214-216 state that the empirical functional connectivity is estimated from the structural connectivity, indicating that the method is only applied to cases where we have both.

      Finally, Supplemental Figure 5 validates a number of metrics on different solely structural networks (which is a very necessary and well-done control). Similarly, while the dynamical model is discussed in depth, and beautifully shown that the specific choice of dynamical model does not directly impact the results, it would be helpful to clarify the dynamical model utilized in the early figures.

      Thank you for pointing out a critical detail that we missed elaborating sufficiently early in the paper: the modelling step. Following your suggestions, we added a paragraph from line 292 to 305 (page 10) expanding on the modelling framework. We also explicitly divided the modelling step in Figure 1 and briefly clarified our modelling choices in the caption. Together, we emphasized the fact that our framework is generally model agnostic, which allows different models of dynamics to be plugged into various anatomical networks. We then clarified that, like in any modelling effort, one needs to first fit/optimize the model parameters to reproduce empirical data. In other words, we emphasized the fact that our framework relies on a computational model as its ‘game’ to infer how regions interact, and we fine-tuned our models to reproduce the empirical FC.

      Again, this is not a critique of the methods, which are excellent, but the presentation. It would help readers, and even me, to have a clear indication of the model earlier. Further, it would help to discuss, both in the introduction and discussion, the datasets required for applying these methods more broadly. For instance, 2-photon recordings are discussed - would it be possible to apply this method then to EM connectomes with functional data recorded for them? In theory, it seems like yes, although the current datasets have 100% observability, whereas 2-photon imaging, or other local methods, will not have perfect overlap between structural and functional connectomes. Discussions like this, related to the assumptions of the model, the necessary datasets, and broader application directions beyond DSI, fMRI, and BOLD cases where the method was validated, would increase the impact and interpretability for a broad readership.

      This is a valid point that we should have been more explicit about. The revised manuscript now contains a paragraph (lines 740-748) clarifying the fact that, throughout this work, we assumed full observability. We then briefly discuss, based on the mathematical principles of the framework, what we expect to happen in cases with partial observability. We then point at two references in which the details of a framework with partial observability are laid out, one containing mathematical proofs and the other using numerical simulations.

      References:

      Hadaeghi, F., Fakhar, K., & Hilgetag, C. C. (2024). Controlling Reciprocity in Binary and Weighted Networks: A Novel Density-Conserving Approach (p. 2024.11.24.625064). bioRxiv. https://doi.org/10.1101/2024.11.24.625064

    1. eLife Assessment

      The paper addresses the problem of optimising the mapping of serum antibody responses against a known antigen. The manuscript describes a method using EM polyclonal epitope mapping to help elucidate endogenous antibodies. The work is interesting and valuable to the fields of immunology and serology, and the strength of evidence to support its findings is considered solid.

    2. Reviewer #1 (Public review):

      Summary:

      The paper addresses the problem of optimising the mapping of serum antibody responses against a known antigen. It uses the croEM analysis of polyclonal Fabs to antibody genes, with the ultimate aim of getting complete and accurate antibody sequences. The method, commonly termed EMPEM, is becoming increasingly used to understand responses in convalescent sera and optimisation of the workflows and provision of openly available tools is of genuine value to a growing number of people.

      The authors do not address the experimental aspects of the methods and do not present novel computational tools, rather they use a series of established computational methods to provide workflows that simplify the interpretation of the EM map in terms of the sequences of dominant antibodies.

      Strengths:

      The paper is well-written and clearly argued. The tests constructed seem appropriate and fair and demonstrate that the workflow works pretty well. For a small subset (~17%) of the EMPEM maps analysed the workflow was able to get convincing assignments of the V-genes.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors seek to demonstrate that it is possible to sequence antibody variable domains from cryoEM reconstructions in combination with bottom-up LC-MSMS. In particular, they extract de novo sequences from single particle-cryo-EM-derived maps of antibodies using the "deep-learning tool ModelAngelo", which are run through the program Stitch to try to select the top scoring V-gene and construct a placeholder sequence for the CDR3 of both the heavy and light chain of the antibody under investigation. These reconstructed variable domains are then used as templates to guide the assembly of de novo peptides from LC-MS/MS data to improve the accuracy of the candidate sequence.

      Using this approach the authors claim to have demonstrated that "cryoEM reconstructions of monoclonal antigen-antibody complexes may contain sufficient information to accurately narrow down candidate V-genes and that this can be integrated with proteomics data to improve the accuracy of candidate sequences".

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper addresses the problem of optimising the mapping of serum antibody responses against a known antigen. It uses the croEM analysis of polyclonal Fabs to antibody genes, with the ultimate aim of getting complete and accurate antibody sequences. The method, commonly termed EMPEM, is becoming increasingly used to understand responses in convalescent sera and optimisation of the workflows and

      The authors do not address the experimental aspects of the methods and do not present novel computational tools, rather they use a series of established computational methods to provide workflows that simplify the interpretation of the EM map in terms of the sequences of dominant antibodies.

      We would like to thank the reviewer for this assessment. While indeed we implement ModelAngelo as published without changes to its algorithms or code, we did add new functionality to Stitch to read the generated output from ModelAngelo and assemble it against known databases of germline-encoded antibody sequences. Of note, ModelAngelo was not primarily developed to determine exact sequence from CryoEM images, but instead to provide input for sequence determination from sequence searches with profile HMMs. Such models are designed to handle ambiguous calls of residues at different positions of a protein sequence. We are of the opinion that one of the main contributions of our study is to finally benchmark the EMPEM approach against known sequences to build a framework for data quality requirements in the future. From our study in best-case scenario’s EM data alone will provide sequences at 80-90% accuracy. In other words, the sequences are riddled with errors and cannot be taken at face value without orthogonal sequencing data. We demonstrate that mass spectrometry data can fill this requirement and yield much improved accuracy of the sequences even against high backgrounds of unrelated antibody sequences. We are incredibly excited about the prospects and future developments for EMPEM and believe that its integration with orthogonal sequencing approaches like MS are critical moving forward. By developing this pipeline we hope to have taken steps in the right direction.

      Strengths:

      The paper is well-written and clearly argued. The tests constructed seem appropriate and fair and demonstrate that the workflow works pretty well. For a small subset (~17%) of the EMPEM maps analysed the workflow was able to get convincing assignments of the V-genes.

      Thanks for the kind assessment.

      Weaknesses:

      The AI methods used are not a substitute for high quality data and at present very few of the results obtained from EMPEM will be of sufficient quality to robustly assign the sequence of the antibody. However, rather more are likely to be good enough, especially in combination with MS data, to provide a pretty good indication of the V-gene family.

      We fully agree with the assessment of the reviewer, as this being a general limitation of the EMPEM field. If anything, we hope our benchmark study and developed pipeline to integrate with MS-based sequencing data have more clearly established the current limitations of the technique and the requirements/prospects for orthogonal sequencing data to fill the missing gaps.

      Reviewer #2 (Public review):

      In this manuscript, the authors seek to demonstrate that it is possible to sequence antibody variable domains from cryoEM reconstructions in combination with bottom-up LC-MSMS. In particular, they extract de novo sequences from single particle-cryo-EM-derived maps of antibodies using the "deep-learning tool ModelAngelo", which are run through the program Stitch to try to select the top scoring V-gene and construct a placeholder sequence for the CDR3 of both the heavy and light chain of the antibody under investigation. These reconstructed variable domains are then used as templates to guide the assembly of de novo peptides from LC-MS/MS data to improve the accuracy of the candidate sequence.

      Using this approach the authors claim to have demonstrated that "cryoEM reconstructions of monoclonal antigen-antibody complexes may contain sufficient information to accurately narrow down candidate V-genes and that this can be integrated with proteomics data to improve the accuracy of candidate sequences".

      WhiIe the approach is clearly a work in progress, the manuscript should made easier to understand for the general reader. Indeed, I had a hard time understanding the workflow until I got to Fig. 3. So re-ordering the figures, for example, may be helpful in this regard.

      It would be useful to provide additional concrete examples where the described workflow would assist in the elucidation of CDR3's, in cases where this isn't already known. (In the benchmark dataset from the Electron Microscopy Data Bank, all the antibodies and Fabs are presumably known, as is the case for the monoclonal antibody CR3022). I am having difficulty envisioning how one would prepare samples from actual plasma samples that would be appropriate for single particle cryo-EM and MS data on dominant antibodies of interest. In my experience, most of these samples tend to be quite complex mixtures. So additional discussion of this point would be helpful.

      We would like to thank the reviewer for their kind and critical assessment of our work. We have adopted the suggestion to reorder the graphical material, such that the workflow schematic is now Figure 1 in the main text. We hope this will improve the readability.

      Regarding the concrete examples where the workflow could aid in elucidating CDR3 sequences, we would like to refer to all published EMPEM studies and in particular those highlighted in Figure 6. We are also actively working to integrate EMPEM data with MS-based sequencing on novel samples, but those will be subject of later studies. We have added additional discussion regarding the experimental feasibility of the approach. We have highlighted several milestone results where functional antibodies were reconstructed from EMPEM and/or MS data. In the discussion we write:

      “While sample complexity remains an important bottleneck, and questions remain about the dynamic range of the true serum antibody repertoire and the depth of coverage from these novel experimental approaches, several studies have recently reached the important milestone of reconstructing functional antibodies from direct measurements of the secreted serum components.” (see references in manuscript)

      “We believe that both EMPEM and MS-based polyclonal antibody sequencing are still limited to the top 1-10 antibodies in the polyclonal mixture. The EMPEM approach is biased towards bigger and well-ordered target antigens, which calls for additional complementary approaches like HDX-MS for a comprehensive polyclonal epitope mapping exercise.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Line 172: I am surprised the heavy chain is not worse than the light chain

      We have added the following sentence:

      “The length of the complete antigen binding loops was estimated with an average error of 0.5 ± 3.3 or 1.7 ± 6.0 residues for heavy and light chain, with average sequence identities of 0.63 and 0.41. While CDRH3 is the more challenging region in MS-based approaches to antibody sequencing, we believe that the moderately better length and sequence accuracy of CDRH3 compared to CDRL3 in ModelAngelo output reflects the CDRH3’s notoriously tight involvement in antigen binding, hence a greater relative stability in the antibody-antigen complex, resulting in better order in the reconstructed EM density maps.”

      Line 175: Global FSC is not going to be useful. Why not use a local value?

      We agree that local resolution estimates would be more appropriate, that is exactly why we added this remark to our initial analysis. However, local resolution estimates are non-trivial and raise the question about ‘how local’ we need to estimate the quality of the map (see for instance https://doi.org/10.1016/j.sbi.2020.06.005). At present, we believe that the required work for this local resolution analysis is not warranted, only to arrive at the rather intuitive if not tautological conclusion that a better map quality translates into more accurate sequences. While we agree that a better quantitative understanding of the data requirements for EMPEM could benefit the field, we opted to leave this, especially considering that the Stitch alignment score is already a good alternative predictor of sequence accuracy compared to map resolution as demonstrated in Figure 3,

      Line 259: 'of the 23 maps' .... Actually there were 46 maps originally, so I feel this is a tad misleading.

      The statistic of ‘46 total’ was added to the text.

    1. eLife Assessment

      This valuable study presents an interesting analysis of the role of the polyamine precursor putrescine in the pili-dependent surface motility of a laboratory strain of Escherichia coli. The overall data convincingly demonstrate a role in this case. This study presents interesting findings for those studying uropathogenic bacteria, and those studying bacterial polyamine function.

    2. Reviewer #2 (Public review):

      Summary:

      Mehta et al., in constructing E. coli strains unable to synthesize polyamines, noted that strains deficient in putrescine synthesis showed decreased movement on semisolid agar. They show that strains incapable of synthesizing putrescine have decreased expression of Type I pilin and, hence, decreased ability to perform pilin-dependent surface motility.

      Strengths:

      The authors characterize the specific polyamine pathways that are important for this phenomenon. RNAseq provides a detailed overview of gene expression in the strain lacking putrescine. They rule out potential effects of pilin phase variation on the phenotype. The data suggest homeostatic control of polyamine synthesis and metabolic changes in response to putrescine.

      Weaknesses:

      The authors do not, in the end, uncover the molecular details of pilin expression per se, but that would require significantly more analyses and data; the mechanisms of pilin regulation are complicated and still not completely understood.

    3. Reviewer #3 (Public review):

      Summary:

      This study by Mehta et al. describes the mechanisms behind the observation that putrescine biosynthesis mutants in Escherichia coli strain W3110 are affected in surface motility. The manuscript shows that the surface motility phenotype is dependent on Type I fimbriae and that putrescine levels affect the expression level of fimbriae. The results further suggest that without putrescine, the metabolism of the cell is shifted towards production of putrescine and away from energy metabolism.

      Strengths:

      The authors show the effect of putrescine on the regulation of type I fimbriae using various strategies (mutants, addition of exogenous, RNA seq, etc.). All experiments converge to the same conclusion that an optimal level of putrescine is needed.

      Weakness:

      The authors use one isolate of E. coli strain W3110, that contains an insertion in fimE which controls the expression of type I fimbriae. The insertion in fimE likely modifies the ratio of cells expressing fimbriae in the population, and it would be important to confirm the results in other isolates or other strains.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Alternate explanations for major conclusions.

      The major conclusions are (a) surface motility of W3110 requires pili which is not novel, (b) pili synthesis and pili-dependent surface motility require putrescine — 1 mM is optimal, and 4 mM is inhibitory, and (c) the existence of a putrescine homeostatic network that maintains intracellular putrescine that involves compensatory mechanisms for low putrescine, including diversion of energy generation toward putrescine synthesis.

      Conclusion a: Reviewer 3 suggests that the mutant may have lost surface motility because of outer surface structures that actually mediate motility but are co-regulated with or depend on pili synthesis. The reviewer explicitly suggests flagella as the alternate appendage, although flagella and pili are reciprocally regulated. Most experiments were performed in a Δ_fliC_ background, which lacks the major flagella subunit, in order to prevent the generation of fast-moving flagella-dependent variants. Furthermore, no other surface structure that could mediate surface motility is apparent in the electron microscope images. This observation does not definitively rule out this possibility, especially because of the large transcriptomic changes with low putrescine. Our explanation is the simplest.

      Conclusion b, first comment: Reviewer 1 states that “it is not possible to conclude that the effects of gene deletions to biosynthetic, transport or catabolic genes on pili-dependent surface motility are due to changes in putrescine levels unless one takes it on faith that there must be changes to putrescine levels.” The comment ignores both the nutritional supplementation and the transcript changes that strongly suggest compensatory mechanisms for low putrescine. Why compensate if the putrescine concentration does not change? The reviewer then implicitly acknowledges changes in putrescine content: “it is important to know how much putrescine must be depleted in order to exert a physiological effect”.

      Conclusion b, second comment: Reviewer 1 proposes that agmatine accumulation can account for some of the observed properties, but which property is not specified. With respect to motility, agmatine accumulation cannot account for motility defects because motility is impaired in (a) a speA mutant which cannot make agmatine and (b) a speC speF double mutant which should not accumulate agmatine. With respect to the transcriptomic results, even if high agmatine is the reason for some transcript changes, the results still suggest a putrescine homeostasis network.

      Conclusion c: the reviewers made no comments on the RNAseq analysis or the interpretation of the existence of a homeostatic network.

      Additional experiments proposed.

      Complementation. Reviewers 1 and 3 suggested complementation experiments, but the latter states that nutritional supplementation strengthens our arguments. The most relevant complementation is with speB.  We tried complementation and found that our control plasmid inhibited motility by increasing the lag time before movement commenced. A plasmid with speB did stimulate motility relative to the control plasmid, but movement with the speB plasmid took 4 days, while wild-type movement took 1.5 days. We think that interpretation of this result is ambiguous. We did not systematically search for plasmids that had no effect on motility.

      The purpose of complementation is to determine whether a second-site mutation is the actual cause of the motility defect. In this case, the artifact is that an alteration in polyamine metabolism is not the cause of the defect. However, external putrescine reverses the effects on motility and pili synthesis in the speB mutant. This result is inconsistent with a second-site mutation. Still, we agree that complementation is important, and because of our difficulties, we tested numerous mutants with defects in polyamine metabolism. The results present an interpretable and coherent pattern. For example, if putrescine is not the regulator, then mutants in putrescine transport and catabolism should have had no effect. Every single mutant is consistent with a role in movement and pili synthesis. The simplest explanation is that putrescine affects movement and pili synthesis.

      Phase variation. Reviewer 2 noted that we did not discuss phase variation. The comment came from the observation that the speB mutant had fewer fimB transcripts which could explain the loss of motility. The reviewer also suggested a simple experiment, which we performed and found that putrescine does not control phase variation. We present those results in the supplemental material. Our discussion of this topic includes a major qualification.

      Testing of additional strains. Published results from another lab showed that surface motility of MG1655 requires spermidine instead of putrescine (PMID 19493013 and 21266585). MG1655 and the W3110 that we used in our study are E. coli K-12 derivatives and phylogenetic group A. Any number of changes in enzymes that affect intracellular putrescine concentration could result in different responses to putrescine. We are currently studying pili synthesis and motility in other strains. While that study is incomplete, loss of speB in a strain of phylogenetic group D eliminates no surface motility. This work was intended as our initial analysis and the focus was on a single strain.

      Measuring intracellular polyamines. We felt that we had provided sufficient evidence to conclude that putrescine controls pili synthesis and putrescine concentrations are lower in the speB mutant: the nutritional supplementation, the lower levels of transcripts for putrescine catabolic enzymes which require putrescine for their expression strongly suggest lower putrescine in a mutant lacking a putrescine biosynthesis gene, and a transcriptomic analysis that found the speB mutant had transcript changes to compensate for low putrescine. We understand the importance of measuring intracellular polyamines. We are currently examining the quantitative relationship between intracellular polyamines and pili synthesis in multiple strains which respond differently to loss of speB.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors should measure putrescine, agmatine, cadaverine, and spermidine levels in their gene deletion strains.

      Polyamine concentration measurements will be part of a separate study on polyamine control of pili synthesis of a uropathogenic strain. A comparison is essential, and the results from W3110 will be part of that study.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 28. Your statements about urinary tract infections are pure speculation. They are fine for the discussion, but should not be in the abstract.

      The abstract from line 27 on has been reworked. The comment of the reviewer is fair.

      (2) Line 65. Do we need this discussion about the various strains? If you keep it, you should point out that they were all W3110 strains. But you could just say that you confirmed that your background strain can do PDSM (since you are also not showing any data for the other isolates). Discussing the various strains implies that you are not confident in your strain and raises the question of why you didn't use a sequenced wt MG1655, or something like that.

      This section has been reworked. Our strain of W3110 has an insertion in fimB which is relevant for movement but does not affect our results. The insertion limits our conclusions about phase variation. We want to point out that strains variations are large. We also sequenced our strain of W3110.

      (3) Related. You occasionally use "W3110-LR" to designate the wild type. You use this or not, but be consistent throughout the text.

      Fixed

      (4) Line 99. Does eLife allow "data not shown"?  

      (5) Line 119. As you note, the phenotype of the puuA patA double mutant is exactly the opposite of what one would expect. Although you provide additional evidence that high levels also inhibit motility, complementing the double mutant would provide confidence that the strain is correct.

      We rapidly ran into issues with complementation which are discussed in public responses to reviewer comments.

      (6) Figure 6C. Either you need to quantify these data or you need a better picture.

      The files were corrupted. It was repeated several time, but we lost the other data.

      (7) Figure 7. Label panels A and B to indicate that these strains are speB. Also, you need to switch panels C and D to match the order of discussion in the manuscript.

      Done

      (8) Line 134. Is there a statistically significant difference in the ELISA between 1 and 4 mM? You need to say one way or the other.

      No statistical significance and this has been added to the paper

      (9) Figure 10C. You need to quantify these data.

      Quantification added as an extra panel.

      (10) Line 164. You include H-NS in the group of "positive effectors that control fim operon expression" and you reference Ecocyc, rather than any primary reference. Nowhere in the manuscript do you mention phase variation. In the speB mutant, you see decreased fimB, increased fimE, and decreased hns expression. My interpretation of the literature suggests that this would drive the fim switch to the off-state. This could certainly explain some of the results. It is also easily measurable with PCR. This might require testing cells scraped directly from the plates.

      The experiments were performed. There is no need to scrap cells from plates because the fimB result from RNAseq was from a liquid culture, and the prediction would be that the phase-locking should be evident in these cells.

      (11) Figure 10. Likewise, do you know that your hns mutant is not locked in the off-state? Granted, the original hns mutants (pilG) showed increased rates of switching, but growth conditions might matter.

      We also did phase variation for the hns mutant and the hns mutant was not phase locked. This result is shown. In addition to growth conditions, the strain probably matters.

      (12) Line 342. You describe the total genome sequencing of W3110, yet this is not mentioned anywhere else in the manuscript.

      It is now

      Minor points:

      (13) Line 192. "One of the most differentially expressed genes...".

      (14) Line 202. "...implicates extracellular putrescine in putrescine homeostasis."

      (15) Line 209. "...potential pili regulators...".

      (16) You are using a variety of fonts on the figures. Pick one.

      (17) Figure 9A. It took me a few minutes to figure out the labeling for this figure and I was more confused after reading the legend. It would be simpler to independently label red triangles, blue triangles, red circles, and blue circles.

      (18) Figure 9B and 10. The reader can likely figure out what W3110_1.0_3 means, but more straightforward labeling would be better, or you need to define these labels.

      All points were addressed and fixed.

      Reviewer #3 (Recommendations for the authors):

      Other comments:

      (1) Please go through the figures and the reference to figures in the text, as they often do not refer to the right panel (ex: figures 2 and 7 for instance). In the text, please homogenize the reference to figures (Figure 2C vs Figure 3). To help compare motility experiments between figures, please use the same scale in all figures.

      This has been fixed.

      (2) Lines 65-70: I am not sure I get the reason behind choosing the W3110 strain from your lab stock. In what background were the initial mutants constructed (from l.64-65)? Were the nine strains tested, all variations of W3110? If so, is the phenotype described in the manuscript robust in all strains?

      We have provided more explanation. W3110 was the most stable: insertions that allowed flagella synthesis in the presence of glucose were frequent. We deleted the major flagella subunit for most experiments. Before introduction of the fliC deletion, we needed to perform experiments 10 times so that fast-moving variants, which had mutationally altered flagella synthesis, did not complicate results.

      (3) Line 82-84: As stated in the public review, I think more controls are needed before making this conclusion, especially as type I fimbriae are usually involved in sessile phenotypes.

      Response provided in the public response.

      (4) In Figure 3: Changing the order of the image to follow the text would make the figure easier to follow.

      Fixed as requested

      (5) Lines 100-101: simultaneous - the results presented here do not support this conclusion. In Figure 4b, the addition of putrescine to speB mutants is actually not different from WT. From the results, it seems like one of biosynthesis or transport is needed, but it's not clear if both are needed simultaneously. For this, a mutant with no biosynthesis and no transport is needed and/or completely non-motile mutants would be needed to compare.

      We disagree. If there are two pathways of putrescine synthesis and both are needed, then our conclusion follows.

      (6) Lines 104-105: '... because E. coli secretes putrescine.' - not sure why this statement is there, as most transporters tested after are importers of putrescine? It is also not clear to me if putrescine is supplemented in the media in these experiments. If not, is there putrescine in the GT media?

      Good points, and this section has been reworded to clarify these issues. Some of the material was moved to the discussion.

      (7) Line 109: 'We note that potE and plaP are more highly expressed than potE and puuP...' - first potE should be potF?

      This has been corrected.

      (8) Figure 8: What is the difference between the TEM images in Figure 1 and here? The WT in Figure 1 does show pili without the supplementation unless I'm missing something here. Please specify.

      The reviewer means Figure 2 and not Figure 1. Figure 2 shows a wild-type strain which has both putrescine anabolic pathways while Figure 8 is the ΔspeB strain which lacks one pathway.

      (9) Line160-162: Transcripts for the putrescine-responsive puuAP and puuDRCBE operons, which specify genes of the major putrescine catabolic pathway, were reduced from 1.6- to 14- fold (FDR {less than or equal to} 0.02) in the speB mutant (Supplemental Table 1), which implies lower intracellular putrescine. I might not get exactly the point here. If the catabolic pathways are repressed in the speB mutant, then there will be less degradation which means more putrescine!?

      Expression of these genes is a function of intracellular putrescine: higher expression means more putrescine. Any discussion of steady putrescine must include the anabolic pathways: the catabolic pathways do not determine the intracellular putrescine, they are a reflection of intracellular putrescine.

      (10) Lines 162-163: Deletion of speB reduced transcripts for genes of the fimA operon and fimE, but not of fimB. It seems that the results suggest the opposite a reduction of fimB but not fimE!?

      The reviewer is correct, and it is our mistake, and the text now states what is in the figure..

    1. eLife Assessment

      This important study analyzes the effect of heat treatment on phage-bacterial interactions and convincingly shows that prior heat exposure alters the bacterial cell envelope, enhancing persistence and bacterial survival when exposed to lytic phages. The study will interest researchers working on antibiotic resistance, tolerance, and phage therapy.

    2. Reviewer #1 (Public review):

      Summary:

      In this interesting and original paper, the authors examine the effect that heat stress can have on the ability of bacterial cells to evade infection by lytic bacteriophages. Briefly, the authors show that heat stress increases the tolerance of Klebsiella pneumoniae to infection by the lytic phage Kp11. They also argue that this increased tolerance facilitates the evolution of genetically encoded resistance to the phage. In addition, they show that heat can reduce the efficacy of phage therapy. Moreover, they define a likely mechanistic reason for both tolerance and genetically encoded resistance. Both lead to a reorganization of the bacterial cell envelope, which reduces the likelihood that phage can successfully inject their DNA.

      Strengths:

      I found large parts of this paper well-written and clearly presented. I also found many of the experiments simple yet compelling. For example, the experiments described in Figure 3 clearly show that prior heat exposure can affect the efficacy of phage therapy. In addition, the experiments shown in Figures 4 and 6 clearly demonstrate the likely mechanistic cause of this effect. The conceptual Figure 7 is clear and illustrates the main ideas well. I think this paper would work even without its central claim, namely that tolerance facilitates the evolution of resistance. The reason is that the effect of environmental stressors on stress tolerance has to my knowledge so far only been shown for drug tolerance, not for tolerance to an antagonistic species.

      Weaknesses:

      I did not detect any weaknesses that would require a major reorganization of the paper, or that may require crucial new experiments. However, the paper needs some work in clarifying specific and central conclusions that the authors draw. More specifically, it needs to improve the connection between what is shown in some figures, how these figures are described in the caption, and how they are discussed in the main text. This is especially glaring with respect to the central claim of the paper from the title, namely that tolerance facilitates the evolution of resistance. I am sympathetic to that claim, especially because this has been shown elsewhere, not for phage resistance but for antibiotic resistance. However, in the description of the results, this is perhaps the weakest aspect of the paper, so I'm a bit mystified as to why the authors focus on this claim. As I mentioned above, the paper could stand on its own even without this claim.

      More specific examples where clarification is needed:

      (1) A key figure of the paper seems to be Figure 2D, yet it was one of the most confusing figures. This results from a mismatch between the accompanying text starting on line 92 and the figure itself. The first thing that the reader notices in the figure itself is the huge discrepancy between the number of viable colonies in the absence of phage infection at the two-hour time point. Yet this observation is not even mentioned in the main text. The exclusive focus of the main text seems to be on the right-hand side of the figure, labeled "+Phage". It is from this right-hand panel that the authors seem to conclude that heat stress facilitates the evolution of resistance. I find this confusing, because there is no difference between the heat-treated and non-treated cells in survivorship, and it is not clear from this data that survivorship is caused by resistance, not by tolerance/persistence. (The difference between tolerance and resistance has only been shown in the independent experiments of Figure 1B.) Figure 2F supports the resistance claim, but it is not one of the strongest experiments of the paper, because the author simply only used "turbidity" as an indicator of resistance. In addition, the authors performed the experiments described therein at small population sizes to avoid the presence of resistance mutations. But how do we know that the turbidity they describe does not result from persisters?

      I see three possibilities to address these issues. First, perhaps this is all a matter of explaining and motivating this particular experiment better. Second, the central claim of the paper may require additional experiments. For example, is it possible to block heat induced tolerance through specific mutations, and show that phage resistance does not evolve as rapidly if tolerance is blocked? A third possibility is to tone down the claim of the paper, and make it about heat tolerance rather than the evolution of heat resistance.

      A minor but general point here is that in Figure 2D and in other figures, the labels "-phage" and "+phage" do not facilitate understanding, because they suggest that cells in the "-phage" treatment have not been exposed to phage at all, but that is not the case. They have survived previous phage treatment and are then replated on media lacking phage.

      (2) Another figure with a mismatch between text and visual materials is Figure 5, specifically Figures 5B-F. The figure is about two different mutants, and it is not even mentioned in the text how these mutants were identified, for example in different or the same replicate populations. What is more, the two mutants are not discussed at all in the main text. That is, the text, starting on line 221 discusses these experiments as if there was only one mutant. This is especially striking as the two mutants behave very differently, as, for example, in Figure 5C. Implicitly, the text talks about the mutant ending in "...C2", and not the one ending in "...C1". To add to the confusion, the text states that the (C2) mutant shows a change in the pspA gene, but in Figure 5f, it is the other (undiscussed) mutant that has a mutation in this gene. Only pspA is discussed further, so what about the other mutants? More generally, it is hard to believe that these were the only mutants that occurred in the genome during experimental evolution. It would be useful to give the reader a 2-3 sentence summary of the genetic diversity that experimental evolution generated.

    3. Reviewer #2 (Public review):

      Summary:

      An initial screening of pretreatment with different stress treatments of K. pneumoniae allowed the identification of heat stress as a protection factor against the infection of the lytic phage Kp11. Then experiments prove that this is mediated not by an increase of phage-resistant bacteria but due to an increase in phage transient tolerant population, which the authors identified as bacteriophage persistence in analogy to antibiotic persistence. Then they proved that phage persistence mediated by heat shock enhanced the evolution of bacterial resistance against the phage. The same trait was observed using other lytic phages, their combinations, and two clinical strains, as well as E. coli and two T phages, hence the phenomenon may be widespread in enterobacteria.

      Next, the elucidation of heat-induced phage persistence was done, determining that phage adsorption was not affected but phage DNA internalization was impaired by the heat pretreatment, likely due to alterations in the bacterial envelope, including the downregulation of envelope proteins and of LPS; furthermore, heat treated bacteria were less sensitive to polymyxins due to the decrease in LPS.

      Finally, cyclic exposure to heat stress allowed the isolation of a mutant that was both resistant to heat treatment, polymyxins, and lytic phage, that mutant had alterations in PspA protein that allowed a gain of function and that promoted the reduction of capsule production and loss of its structure; nevertheless this mutant was severely impaired in immune evasion as it was easily cleared from mice blood, evidencing the tradeoffs between phage/heat and antibiotic resistance and the ability to counteract the immune response.

      Strengths:

      The experimental design and the sequence in which they are presented are ideal for the understanding of their study and the conclusions are supported by the findings, also the discussion points out the relevance of their work particularly in the effectiveness of phage therapy, and allows the design of strategies to improve their effectiveness.

      Weaknesses:

      In its present form, it lacks the incorporation of some relevant previous work that explored the role of heat stress in phage susceptibility, antibiotic susceptibility, tradeoffs between phage resistance and resistance against other kinds of stress, virulence, etc., and the fact that exposure to lytic phages induces antibiotic persistence.

    4. Reviewer #3 (Public review):

      PspA, a key regulator in the phage shock protein system, functions as part of the envelope stress response system in bacteria, preventing membrane depolarization and ensuring the envelope stability. This protein has been associated in the Quorum Sensing network and biofilm formation. (Moscoso M., Garcia E., Lopez R. 2006. Biofilm formation by Streptococcus pneumoniae: role of choline, extracellular DNA, and capsular polysaccharide in microbial accretion. J. Bacteriol. 188:7785-7795; Vidal JE, Ludewick HP, Kunkel RM, Zähner D, Klugman KP. The LuxS-dependent quorum-sensing system regulates early biofilm formation by Streptococcus pneumoniae strain D39. Infect Immun. 2011 Oct;79(10):4050-60.)

      It is interesting and very well-developed.

      (1) Could the authors develop experiments about the relationship between Quorum Sensing and this protein?

      (2) It would be interesting to analyze the link to phage infection and heat stress in relation to Quorum. The authors could study QS regulators or AI2 molecules.

      (3) Include the proteins or genes in a table or figure from lytic phage Kp11 (GenBank: ON148528.1).

    1. eLife Assessment

      This important study leverages the power of Drosophila genetics and sparsely-labeled neurons to propose an intriguing new model for neuronal injury signaling. The authors present convincing evidence to show that the somatic response to axonal injury can be suppressed if the injury is not complete, suggesting the presence of a new mode of injury 'integration.' While the underlying mechanism of this fascinating observation has yet to be determined, the phenomenon itself will be of broad significance in the field.

    2. Reviewer #1 (Public review):

      This manuscript presents an interesting exploration of the potential activation mechanisms of DLK following axonal injury. While the experiments are beautifully conducted and the data are solid, I feel that there is insufficient evidence to fully support the conclusions made by the authors.

      In this manuscript, the authors exclusively use the puc-lacZ reporter to determine the activation of DLK. This reporter has been shown to be induced when DLK is activated. However, there is insufficient evidence to confirm that the absence of reporter activation necessarily indicates that DLK is inactive. As with many MAP kinase pathways, the DLK pathway can be locally or globally activated in neurons, and the level of DLK activation may depend on the strength of the stimulation. This reporter might only reflect strong DLK activation and may not be turned on if DLK is weakly activated.

      As noted by the authors, DLK has been implicated in both axon regeneration and degeneration. Following axotomy, DLK activation can lead to the degeneration of distal axons, where synapses are located. This raises an important question: how is DLK activated in distal axons? The authors might consider discussing the significance of this "synapse connection-dependent" DLK activation in the broader context of DLK function and activation mechanisms.

    3. Reviewer #2 (Public review):

      Summary:

      The authors study a panel of sparsely labeled neuronal lines in Drosophila that each form multiple synapses. Critically, each axonal branch can be injured without affecting the others, allowing the authors to differentiate between injuries that affect all axonal branches versus those that do not, creating spared branches. This is a highly powerful model. Axonal injuries are known to cause Wnd (mammalian DLK)-dependent retrograde signals to the cell body, culminating in a transcriptional response. This work identifies a fascinating new phenomenon that this injury response is not all-or-none. If even a single branch remains uninjured, the injury signal is not activated in the cell body. The authors rule out that this could be due to changes in the abundance of Wnd (perhaps if incrementally activated at each injured branch) by Wnd, Hiw's known negative regulator. Thus there is both a yet-undiscovered mechanism to regulate Wnd signaling, and more broadly a mechanism by which the neuron can integrate the degree of injury it has sustained. It will now be important to tease apart the mechanism(s) of this fascinating phenomenon. But even absent a clear mechanism, this is a new biology that will inform the interpretation of injury signaling studies across species.

      Strengths:

      - A conceptually beautiful series of experiments that reveal a fascinating new phenomenon is described, with clear implications (as the authors discuss in their Discussion) for injury signaling in mammals.<br /> - Suggests a new mode of Wnd regulation, independent of Hiw.

      Weaknesses:

      -The use of a somatic transcriptional reporter for Wnd activity is powerful, however, the reporter indicates whether the transcriptional response was activated, not whether the injury signal was received. It remains possible that Wnd is still activated in the case of a spared branch, but that this activation is either local within the axons (impossible determine in the absence of a local reporter) or that the retrograde signal was indeed generated but it was somehow insufficient to activate transcription when it entered the cell body. This is more of a mechanistic detail (and likely an extreme technical challenge to assess) and should not detract from the overall importance of the study

      -That the protective effect of a spared branch is independent of Hiw, the known negative regulator of Wnd, is fascinating. But this leaves open a key question: what is the signal?

      Comments on revisions:

      I appreciate your discussion about the potential bi-modal regulation of the puckered transcriptional reporter and think that readers would benefit from a short discussion of this.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript seeks to understand how nerve injury-induced signaling to the nucleus is influenced, and it establishes a new location where these principles can be studied. By identifying and mapping specific bifurcated neuronal innervations in the Drosophila larvae, and using laser axotomy to localize the injury, the authors find that sparing a branch of a complex muscular innervation is enough to impair Wallenda-puc (analogous to DLK-JNK-cJun) signaling that is known to promote regeneration. It is only when all connections to the target are disconnected that cJun-transcriptional activation occurs.

      Overall, this is a thorough and well-performed investigation of the mechanism of spared-branch influence on axon injury signaling. The findings on control of wnd are important because this is a very widely used injury signaling pathway across species and injury models. The authors present detailed and carefully executed experiments to support their conclusions. Their effort to identify the control mechanism is admirable and will be of aid to the field as they continue to try to understand how to promote better regeneration of axons.

      Strengths:

      The paper does a very comprehensive job of investigating this phenomenon at multiple locations and through both pinpoint laser injury as well as larger crush models. They identify a non-hiw based restraint mechanism of the wnd-puc signaling axis that presumably is originating from the spared terminal. They also present a large list of tests they performed to identify the actual restraint mechanism from the spared branch, which has ruled out many of the most likely explanations. This is an extremely important set of information to report, to guide future investigators in this and other model organisms on mechanisms by which regeneration signaling is controlled (or not).

      Weaknesses:

      While there are many questions raised by these results that are not answered here, including the pathways upstream and downstream of DLK and how the binary switch control of DLK/puc signaling is executed, the model built in this manuscript is valuable to future work going after these important questions.

      Because the conclusions of the paper are focused on a single (albeit well validated) reporter in different types of motor neurons, it is hard to determine whether the mechanism of spared branch inhibition of regeneration requires wnd-puc (DLK/cJun) signaling, or whether this is a binary/threshold response in all contexts (for example, sensory axons or interneurons). However, the author points out in the response that there are sensory neuron examples where a spared connection does not block DLK activation. As such, it may not be a universal mechanism but could provide a model for better understanding of DLK control across different contexts.

      Comments on revisions:

      The new panels in Figure 1E do not have Y-axis labels. (mean puc-lacZ intensity?)

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This manuscript presents an interesting exploration of the potential activation mechanisms of DLK following axonal injury. While the experiments are beautifully conducted and the data are solid, I feel that there is insufficient evidence to fully support the conclusions made by the authors.

      In this manuscript, the authors exclusively use the puc-lacZ reporter to determine the activation of DLK. This reporter has been shown to be induced when DLK is activated.

      However, there is insufficient evidence to confirm that the absence of reporter activation necessarily indicates that DLK is inactive. As with many MAP kinase pathways, the DLK pathway can be locally or globally activated in neurons, and the level of DLK activation may depend on the strength of the stimulation. This reporter might only reflect strong DLK activation and may not be turned on if DLK is weakly activated. The results presented in this manuscript support this interpretation. Strong stimulation, such as axotomy of all synaptic branches, caused robust DLK activation, as indicated by puc-lacZ expression. In contrast, weak stimulation, such as axotomy of some synaptic branches, resulted in weaker DLK activation, which did not induce the puc-lacZ reporter. This suggests that the strength of DLK activation depends on the severity of the injury rather than the presence of intact synapses. Given that this is a central conclusion of the study, it may be worthwhile to confirm this further. Alternatively, the authors may consider refining their conclusion to better align with the evidence presented.

      In Figure 1E we have replotted the puc-lacZ data to show comparisons between different injuries that leave different numbers of spared (or lost) boutons and branches.  We observed no differences between injuries that remove only a small fraction of boutons (injury location (a)) and injuries that remove nearly all of them (injury locations (b) and (c)) and uninjured neurons (Figure 1E). These observations argue against the interpretation that the strength of DLK activation (at least within the cell body) depends on the severity of injury. Rather, puc-lacZ induction appears to be bimodal. It is either induced (in various injuries that remove all synaptic boutons), or not induced, including in injuries that spared only a small fraction of the total boutons. We therefore think that the presence of a remaining synaptic connection rather than the extent of the injury per se is a major determinant of whether the cell body component of Wnd signaling can be activated. 

      The reviewer (and others) fairly point out that our current study focuses on puc-lacZ as a reporter of Wnd signaling in the cell body. We consider this to be a downstream integration of events in axons that are more challenging to detect. It is striking that this integration appears strongly sensitized to the presence of spared synaptic boutons. Examination of Wnd’s activation in axons and synapses is a goal for our future work.

      As noted by the authors, DLK has been implicated in both axon regeneration and degeneration. Following axotomy, DLK activation can lead to the degeneration of distal axons, where synapses are located. This raises an important question: how is DLK activated in distal axons? The authors might consider discussing the significance of this "synapse connection-dependent" DLK activation in the broader context of DLK function and activation mechanisms.

      While it has been noted that inhibition of DLK can mildly delay Wallerian degeneration (Miller et al., 2009), this does not appear to be the case for retinal ganglion cell axons following optic nerve crush (Fernandes et al., 2014). It is also not the case for Drosophila motoneurons and NMJ terminals following peripheral nerve injury (Xiong et al., 2012; Xiong and Collins, 2012). Instead, overexpression of Wnd or activation of Wnd by a conditioning injury leads to an opposite phenotype - an increase in resiliency to Wallerian degeneration for axons that have been previously injured (Xiong et al., 2012; Xiong and Collins, 2012). The downstream outcome of Wnd activation is highly dependent on the context; it may be an integration of the outcomes of local Wnd/DLK activation in axons with downstream consequences of nuclear/cell body signaling.  The current study suggests some rules for the cell body signaling, however, how Wnd is regulated at synapses and why it promotes degeneration in some circumstances but not others are important future questions.

      For the reviewer’s suggestion, it is interesting to consider DLK’s potential contributions to the loss of NMJ synapses in a mouse model of ALS (Le Pichon et al., 2017; Wlaschin et al., 2023). Our findings suggest that the synaptic terminal is an important locus of DLK regulation, while dysfunction of NMJ terminals is an important feature of the ‘dying back’ hypothesis of disease etiology (Dadon-Nachum et al., 2011; Verma et al., 2022). We propose that the regulation of DLK at synaptic terminals is an important area for future study, and may reveal how DLK might be modulated to curtail disease progression. Of note, DLK inhibitors are in clinical trials (Katz et al., 2022; Le et al., 2023; Siu et al., 2018), but at least some have been paused due to safety concerns (Katz et al., 2022). Further understanding of the mechanisms that regulate DLK are needed to understand whether and how DLK and its downstream signaling can be tuned for therapeutic benefit.

      Reviewer #2 (Public review):

      Summary:

      The authors study a panel of sparsely labeled neuronal lines in Drosophila that each form multiple synapses. Critically, each axonal branch can be injured without affecting the others, allowing the authors to differentiate between injuries that affect all axonal branches versus those that do not, creating spared branches. Axonal injuries are known to cause Wnd (mammalian DLK)-dependent retrograde signals to the cell body, culminating in a transcriptional response. This work identifies a fascinating new phenomenon that this injury response is not all-or-none. If even a single branch remains uninjured, the injury signal is not activated in the cell body. The authors rule out that this could be due to changes in the abundance of Wnd (perhaps if incrementally activated at each injured branch) by Wnd, Hiw's known negative regulator. Thus there is both a yet-undiscovered mechanism to regulate Wnd signaling, and more broadly a mechanism by which the neuron can integrate the degree of injury it has sustained. It will now be important to tease apart the mechanism(s) of this fascinating phenomenon. But even absent a clear mechanism, this is a new biology that will inform the interpretation of injury signaling studies across species.

      Strengths:

      (1) A conceptually beautiful series of experiments that reveal a fascinating new phenomenon is described, with clear implications (as the authors discuss in their Discussion) for injury signaling in mammals.

      (2) Suggests a new mode of Wnd regulation, independent of Hiw.

      Weaknesses:

      (1) The use of a somatic transcriptional reporter for Wnd activity is powerful, however, the reporter indicates whether the transcriptional response was activated, not whether the injury signal was received. It remains possible that Wnd is still activated in the case of a spared branch, but that this activation is either local within the axons (impossible to determine in the absence of a local reporter) or that the retrograde signal was indeed generated but it was somehow insufficient to activate transcription when it entered the cell body. This is more of a mechanistic detail and should not detract from the overall importance of the study

      We agree. The puc-lacZ reporter tells us about signaling in the cell body, but whether and how Wnd is regulated in axons and synaptic branches, which we think occurs upstream of the cell body response, remains to be addressed in future studies.

      (2) That the protective effect of a spared branch is independent of Hiw, the known negative regulator of Wnd, is fascinating. But this leaves open a key question: what is the signal?

      This is indeed an important future question, and would still be a question even if Hiw were part of the protective mechanism by the spared synaptic branch. Our current hypothesis (outlined in Figure 4) is that regulation of Wnd is tied to the retrograde trafficking of a signaling organelle in axons. The Hiw-independent regulation complements other observations in the literature that multiple pathways regulate Wnd/DLK (Collins et al., 2006; Feoktistov and Herman, 2016; Klinedinst et al., 2013; Li et al., 2017; Russo and DiAntonio, 2019; Valakh et al., 2013). It is logical for this critical stress response pathway to have multiple modes of regulation that may act in parallel to tune and restrain its activation. 

      Reviewer #3 (Public review):

      Summary:

      This manuscript seeks to understand how nerve injury-induced signaling to the nucleus is influenced, and it establishes a new location where these principles can be studied. By identifying and mapping specific bifurcated neuronal innervations in the Drosophila larvae, and using laser axotomy to localize the injury, the authors find that sparing a branch of a complex muscular innervation is enough to impair Wallenda-puc (analogous to DLK-JNKcJun) signaling that is known to promote regeneration. It is only when all connections to the target are disconnected that cJun-transcriptional activation occurs.

      Overall, this is a thorough and well-performed investigation of the mechanism of sparedbranch influence on axon injury signaling. The findings on control of wnd are important because this is a very widely used injury signaling pathway across species and injury models. The authors present detailed and carefully executed experiments to support their conclusions. Their effort to identify the control mechanism is admirable and will be of aid to the field as they continue to try to understand how to promote better regeneration of axons.

      Strengths:

      The paper does a very comprehensive job of investigating this phenomenon at multiple locations and through both pinpoint laser injury as well as larger crush models. They identify a non-hiw based restraint mechanism of the wnd-puc signaling axis that presumably originates from the spared terminal. They also present a large list of tests they performed to identify the actual restraint mechanism from the spared branch, which has ruled out many of the most likely explanations. This is an extremely important set of information to report, to guide future investigators in this and other model organisms on mechanisms by which regeneration signaling is controlled (or not).

      Weaknesses:

      The weakest data presented by this manuscript is the study of the actual amounts of Wallenda protein in the axon. The authors argue that increased Wnd protein is being anterogradely delivered from the soma, but no support for this is given. Whether this change is due to transcription/translation, protein stability, transport, or other means is not investigated in this work. However, because this point is not central to the arguments in the paper, it is only a minor critique.

      We agree and are glad that the reviewer considers this a minor critique; this is an area for future study. In Supplemental Figure 1 we present differences in the levels of an ectopically expressed GFP-Wnd-kinase-dead transgene, which is strikingly increased in axons that have received a full but not partial axotomy. We suspect this accumulation occurs downstream of the cell body response because of the timing. We observed the accumulations after 24 hours (Figure S1F) but not at early (1-4 hour) time points following axotomy (data not shown). Further study of the local regulation of Wnd protein and its kinase activity in axons is an important future direction.

      As far as the scope of impact: because the conclusions of the paper are focused on a single (albeit well-validated) reporter in different types of motor neurons, it is hard to determine whether the mechanism of spared branch inhibition of regeneration requires wnd-puc (DLK/cJun) signaling in all contexts (for example, sensory axons or interneurons). Is the nerve-muscle connection the rule or the exception in terms of regeneration program activation?

      DLK signaling is strongly activated in DRG sensory neurons following peripheral nerve injury (Shin et al., 2012), despite the fact that sensory neurons have bifurcated axons and their projections in the dorsal spinal cord are not directly damaged by injuries to the peripheral nerve. Therefore it is unlikely that protection by a spared synapse is a universal rule for all neuron types. However the molecular mechanisms that underlie this regulation may indeed be shared across different types of neurons but utilized in different ways. For instance, nerve growth factor withdrawal can lead to activation of DLK (Ghosh et al., 2011), however neurotrophins and their receptors are regulated and implemented differently in different cell types. We suspect that the restraint of Wnd signaling by the spared synaptic branch shares a common underlying mechanism with the restraint of DLK signaling by neurotrophin signaling. Further elucidation of the molecular mechanism is an important next step towards addressing this question. 

      Because changes in puc-lacZ intensity are the major readout, it would be helpful to better explain the significance of the amount of puc-lacZ in the nucleus with respect to the activation of regeneration. Is it known that scaling up the amount of puc-lacZ transcription scales functional responses (regeneration or others)? The alternative would be that only a small amount of puc-lacZ is sufficient to efficiently induce relevant pathways (threshold response).

      While induction of puc-lacZ expression correlates with Wnd-mediated phenotypes, including sprouting of injured axons (Xiong et al., 2010), protection from Wallerian degeneration (Xiong et al., 2012; Xiong and Collins, 2012) and synaptic overgrowth (Collins et al., 2006), we have not observed any correlation between the degree of puc-lacZ induction (eg modest, medium or high) and the phenotypic outcomes (sprouting, overgrowth, etc). Rather, there appears to be a striking all-or-none difference in whether puc-lacZ is induced or not induced. There may indeed be a threshold that can be restrained through multiple mechanisms. We posit in figure 4 that restraint may take place in the cell body, where it can be influenced by the spared bifurcation. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      This is a beautiful study. Naturally, you're searching now for the underlying mechanism.

      A few questions:

      (1) At present you can not determine if the Wnd signal is never initiated (when a spared branch is present) or if it gets to the cell body but is incapable of activating the puckered reporter. Is there any optical reporter (JNK activation?) that could differentiate this?

      The reviewer is correct that a tool to detect local activity of JNK kinase in axons would be ideal for probing the mechanisms that underlie our observations. A FRET reporter for JNK kinase activity has been developed and utilized in cultured cells (Fosbrink et al. 2010). It would be interesting to implement this reporter in Drosophila; it would need to be sensitive enough to visualize  in single Drosophila axons. We have previously noted Wnd-dependent phosphorylated JNK in the cell body of injured motoneurons following nerve crush (Xiong et al., 2010). However anti-pJNK antibodies detect what appears to be a constitutive signal in uninjured axons that does not appear to be influenced by activation or inhibition of Wnd (Xiong et al., 2010).

      (2) What happens when you injure the axon in a dSarm KO? This is more of a curiosity, not a necessity, but is it the axon dying or the detection of the injury itself?

      We have tested whether overexpression of Nmnat or the WldS transgene, which inhibit Wallerian degeneration of injured axons, affect the induction of puc-lacZ following nerve injury. This manipulation has no effect on puc-lacZ expression in uninjured animals, and also has no effect on the induction of puc-lacZ following peripheral nerve crush (TJ Waller, personal communication).

      (3) Are Wnd rescue experiments possible in this context? Would be an interesting place to do Wnd structure-function and compare it to the synaptic work.

      This is not possible with current reagents. Expression of wild type wnd cDNA under the Gal4/UAS promoter leads to strong induction of puc-lacZ in uninjured animals, even when weak Gal4 driver lines are used (Xiong et al., 2012, 2010). Similar observations of constitutively active signaling have been observed for expression studies of DLK in mammalian cells ((Hao et al., 2016; Huntwork-Rodriguez et al., 2013; Nihalani et al., 2000), and data not shown). These and other observations suggest that the levels of Wnd/DLK protein are tightly controlled by posttranscriptional mechanisms. Delineation of sequences within Wnd/DLK that are required for its regulation would be helpful for addressing this question.

      This will be required reading in my lab.

      That is an honor. We look forward to help from the field to understand how and why this pathway is restrained at synapses. Your students may bring new ideas to the table.

      Reviewer #3 (Recommendations for the authors):

      Piezo is spelled incorrectly in the supplemental table in multiple places.

      Thank you for pointing this out! We have made the correction.

      References cited (in rebuttal)

      Collins CA, Wairkar YP, Johnson SL, DiAntonio A. 2006. Highwire restrains synaptic growth by attenuating a MAP kinase signal. Neuron 51:57–69.

      Dadon-Nachum M, Melamed E, Offen D. 2011. The “dying-back” phenomenon of motor neurons in ALS. J Mol Neurosci 43:470–477.

      Feoktistov AI, Herman TG. 2016. Wallenda/DLK protein levels are temporally downregulated by Tramtrack69 to allow R7 growth cones to become stationary boutons. Development 143:2983–2993.

      Fernandes KA, Harder JM, John SW, Shrager P, Libby RT. 2014. DLK-dependent signaling is important for somal but not axonal degeneration of retinal ganglion cells following axonal injury. Neurobiol Dis 69:108–116.

      Ghosh AS, Wang B, Pozniak CD, Chen M, Watts RJ, Lewcock JW. 2011. DLK induces developmental neuronal degeneration via selective regulation of proapoptotic JNK activity. J Cell Biol 194:751–764.

      Hao Y, Frey E, Yoon C, Wong H, Nestorovski D, Holzman LB, Giger RJ, DiAntonio A, Collins C. 2016. An evolutionarily conserved mechanism for cAMP elicited axonal regeneration involves direct activation of the dual leucine zipper kinase DLK. Elife 5. doi:10.7554/eLife.14048

      Huntwork-Rodriguez S, Wang B, Watkins T, Ghosh AS, Pozniak CD, Bustos D, Newton K, Kirkpatrick DS, Lewcock JW. 2013. JNK-mediated phosphorylation of DLK suppresses its ubiquitination to promote neuronal apoptosis. J Cell Biol 202:747–763.

      Katz JS, Rothstein JD, Cudkowicz ME, Genge A, Oskarsson B, Hains AB, Chen C, Galanter J, Burgess BL, Cho W, Kerchner GA, Yeh FL, Ghosh AS, Cheeti S, Brooks L, Honigberg L, Couch JA, Rothenberg ME, Brunstein F, Sharma KR, van den Berg L, Berry JD, Glass JD. 2022. A Phase 1 study of GDC-0134, a dual leucine zipper kinase inhibitor, in ALS. Ann Clin Transl Neurol 9:50–66.

      Klinedinst S, Wang X, Xiong X, Haenfler JM, Collins CA. 2013. Independent pathways downstream of the Wnd/DLK MAPKKK regulate synaptic structure, axonal transport, and injury signaling. J Neurosci 33:12764–12778.

      Le K, Soth MJ, Cross JB, Liu G, Ray WJ, Ma J, Goodwani SG, Acton PJ, Buggia-Prevot V, Akkermans O, Barker J, Conner ML, Jiang Y, Liu Z, McEwan P, Warner-Schmidt J, Xu A, Zebisch M, Heijnen CJ, Abrahams B, Jones P. 2023. Discovery of IACS-52825, a potent and selective DLK inhibitor for treatment of chemotherapy-induced peripheral neuropathy. J Med Chem 66:9954–9971.

      Le Pichon CE, Meilandt WJ, Dominguez S, Solanoy H, Lin H, Ngu H, Gogineni A, Sengupta Ghosh A, Jiang Z, Lee S-H, Maloney J, Gandham VD, Pozniak CD, Wang B, Lee S, Siu M, Patel S, Modrusan Z, Liu X, Rudhard Y, Baca M, Gustafson A, Kaminker J, Carano RAD, Huang EJ, Foreman O, Weimer R, Scearce-Levie K, Lewcock JW. 2017. Loss of dual leucine zipper kinase signaling is protective in animal models of neurodegenerative disease. Sci Transl Med 9. doi:10.1126/scitranslmed.aag0394

      Li J, Zhang YV, Asghari Adib E, Stanchev DT, Xiong X, Klinedinst S, Soppina P, Jahn TR, Hume RI, Rasse TM, Collins CA. 2017. Restraint of presynaptic protein levels by Wnd/DLK signaling mediates synaptic defects associated with the kinesin-3 motor Unc-104. Elife 6. doi:10.7554/eLife.24271

      Miller BR, Press C, Daniels RW, Sasaki Y, Milbrandt J, DiAntonio A. 2009. A dual leucine kinase-dependent axon self-destruction program promotes Wallerian degeneration. Nat Neurosci 12:387–389.

      Nihalani D, Merritt S, Holzman LB. 2000. Identification of structural and functional domains in mixed lineage kinase dual leucine zipper-bearing kinase required for complex formation and stress-activated protein kinase activation. J Biol Chem 275:7273–7279.

      Russo A, DiAntonio A. 2019. Wnd/DLK is a critical target of FMRP responsible for neurodevelopmental and behavior defects in the Drosophila model of fragile X syndrome. Cell Rep 28:2581–2593.e5.

      Shin JE, Cho Y, Beirowski B, Milbrandt J, Cavalli V, DiAntonio A. 2012. Dual leucine zipper kinase is required for retrograde injury signaling and axonal regeneration. Neuron 74:1015– 1022.

      Siu M, Sengupta Ghosh A, Lewcock JW. 2018. Dual Leucine Zipper Kinase Inhibitors for the Treatment of Neurodegeneration. J Med Chem 61:8078–8087.

      Valakh V, Walker LJ, Skeath JB, DiAntonio A. 2013. Loss of the spectraplakin short stop activates the DLK injury response pathway in Drosophila. J Neurosci 33:17863–17873.

      Verma S, Khurana S, Vats A, Sahu B, Ganguly NK, Chakraborti P, Gourie-Devi M, Taneja V. 2022. Neuromuscular junction dysfunction in amyotrophic lateral sclerosis. Mol Neurobiol 59:1502–1527.

      Wlaschin JJ, Donahue C, Gluski J, Osborne JF, Ramos LM, Silberberg H, Le Pichon CE. 2023. Promoting regeneration while blocking cell death preserves motor neuron function in a model of ALS. Brain 146:2016–2028.

      Xiong X, Collins CA. 2012. A conditioning lesion protects axons from degeneration via the Wallenda/DLK MAP kinase signaling cascade. J Neurosci 32:610–615.

      Xiong X, Hao Y, Sun K, Li J, Li X, Mishra B, Soppina P, Wu C, Hume RI, Collins CA. 2012. The Highwire ubiquitin ligase promotes axonal degeneration by tuning levels of Nmnat protein. PLoS Biol 10:e1001440.

      Xiong X, Wang X, Ewanek R, Bhat P, Diantonio A, Collins CA. 2010. Protein turnover of the Wallenda/DLK kinase regulates a retrograde response to axonal injury. J Cell Biol 191:211– 223.

    1. eLife Assessment

      This study reports that the RNA binding and cardiomyopathy-associated protein RBM20 is expressed in specific populations of neurons in the CNS, where it binds to and regulates the expression of synapse-related RNAs. This is an important finding because it reveals a new mechanism for gene regulation in neurons by an RNA binding protein previously studied in the heart; the authors also provide data to suggest that the mechanism by which RBM20 acts in neurons may be distinct from the splicing regulation studied in cardiac tissue. The data in support of the binding and regulation of RNAs by RBM20 is compelling, using leading edge sequencing methods to determine RNA binding profiles, and cell type specific genetics for evaluation of function.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of this study set out to find RNA binding proteins in the CNS in cell-type specific sequencing data and discover that the cardiomyopathy-associated protein RBM20 is selectively expressed in olfactory bulb glutamatergic neurons and PV+ GABAergic neurons. They make an HA-tagged RBM20 allele to perform CLIP-seq to identify RBM20 binding sites and find direct targets of RBM20 in olfactory bulb glutmatergic neurons. In these neurons, RBM20 binds intronic regions. RBM20 has previously been implicated in splicing, but when they selectively knockout RBM20 in glutamatergic neurons they do not see changes in splicing, but they do see changes in RNA abundance, especially of long genes with many introns, which are enriched for synapse-associated functions. These data show that RBM20 has important functions in gene regulation in neurons, which was previously unknown, and they suggest it acts through a mechanism distinct from what has been studied before in cardiomyocytes.

      Strengths:

      The study finds expression of the cardiomyopathy-associated RNA binding protein RBM20 in specific neurons in the brain, opening new windows into its potential functions there.

      The study uses CLIP-seq to identify RBM20 binding RNAs in olfactory bulb neurons.

      Conditional knockout of RBM20 in glutamatergic or PV neurons allows the authors to detect mRNA expression that is regulated by RBM20.

      The data include substantial controls and quality control information to support the rigor of the findings.

      Weaknesses:

      The authors do not fully identify the mechanism by which RBM20 acts to regulate RNA expression in neurons, though they do provide data suggesting that neuronal RBM20 does not regulate alternate splicing in neurons, which is an interesting contrast to its proposed mechanism of function in cardiomyocytes. Discovery of the RNA regulatory functions of RBM20 in neurons is left as a question for future studies.

      The study does not identify functional consequences of the RNA changes in the conditional knockout cells, so this is also a question for the future.

    3. Reviewer #2 (Public review):

      Summary:

      The group around Prof. Scheiffele has made seminal discoveries reg. alternative splicing that is reflected by a current ERC advanced grant and landmark papers in eLife (2015), Science (2016), and Nature Neuroscience (2019). Recently, the group investigated proteins that contain an RRM motif in the mouse cortex. One of them, termed RBM20, was originally thought be muscle-specific and involved in alternative splicing in cardiomyocytes. However, upon close inspection, RBP20 is expressed in a particular set of interneurons (PV positive cells of the somatosensory cortex) in the cortex as well as in mitral cells of the olfactory bulb (OB). Importantly, they used CLIP to identify targets in the OB and heart. Next and quite importantly, they generated a knock-in mouse line with a His-biotin acceptor peptide and a HA epitope to perform specific biochemistry. Not surprisingly, this allowed them to specifically identify transcripts with long introns, however, most of the intronic binding sites were very distant to the splice sites. Closer GO term inspection revealed that RBM20 specifically regulates synapse-related transcripts. In order to get in vivo insight into its function in the brain, the authors generated both global as well as conditional KO mice. Surprisingly, there were no significant differences in in RBM20 PV interneurons, however, 409 transcripts were deregulated in in OB glutamatergic neurons. Here, CLIP sites were mostly found to be very distant from differentially expressed exons. Furthermore, loss-of-function RBM20 primarily yields loss of transcripts, whereas upregulation appears to be indirect. Together, these results strongly suggest a role of RBM20 in the inclusion of cryptic exons thereby promoting target degradation.

      Strengths:

      The quality of the data and the figures is high, impressive and convincing. The reported results strongly suggest a role of RBM20 in the inclusion of cryptic exons thereby promoting target degradation.

      Weaknesses:

      In their revised manuscript, the authors significantly improved the intro and results section, which is now much better suited for the general public and allows better to follow the logic of the experiments. Also, the discussion has now been expanded doing better justice to the importance of the findings presented.

      In my opinion, the revised manuscript clearly improved and represents a timely and important study, which provides major new insight into the expression and possible function of RBM20 in tissues outside of muscle.

    4. Reviewer #3 (Public review):

      Summary:

      The authors identified RBM20 expression in neural tissues using cell type-specific transcriptomic analysis. This discovery was further validated through in vitro and in vivo approaches, including RNA fluorescent in situ hybridization (FISH), open-source datasets, immunostaining, western blotting, and gene-edited RBM20 knockout (KO) mice. CLIP-seq and RiboTRAP data demonstrated that RBM20 regulates common targets in both neural and cardiac tissues, while also modulating tissue-specific targets. Furthermore, the study revealed that neuronal RBM20 governs long pre-mRNAs encoding synaptic proteins.

      Strengths:

      • Utilization of a large dataset combined with experimental evidence to identify and validate RBM20 expression in neural tissues.<br /> • Global and tissue-specific RBM20 KO mouse models provide robust support for RBM20 localization and expression.<br /> • Employing heart tissue as a control highlights the unique findings in neural tissues.

      Weaknesses:

      • Lack of physiological functional studies to explore RBM20's role in neural tissues.<br /> • Data quality requires improvement for stronger conclusions.

      Comments on revisions:

      The authors have effectively addressed most of my concerns, which has significantly improved the quality and reliability of the data. While sufficient functional data were not provided, the current findings offer valuable and novel insights into the expression of RBM20 in neurons. I have no further concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      We thank the three reviewers for the constructive suggestions made in the Public Reviews and the Recommendations to Authors. We have now addressed these comments in a revised manuscript as follows:

      (1) We will revise the text according to the reviewer suggestions and provide more detailed explanations in results and discussion.

      (2) We have uploaded higher resolution images of several figures (resolution had been reduced to achieve lower file sizes) to address the comment regarding “data quality”.

      (3) We have included additional data on eCLIP control experiments in the supplementary figures.

      (4) We have performed additional replications of the western blot analysis for Rbm20 knock-out animals and provided the data in a new Figure.

      Recommendations for the authors:

      Reviewer #1:

      (1) The study is missing CLIP-seq data from control mice that do not express HA, or HA-knocked into a safe-harbor locus. This is important because there is plenty of background HA staining in Figure S2B, in wild-type mice. Including this control would allow subsequent peak calling to distinguish between non-specific HA peaks and RBM20 specific peaks.

      The biochemical conditions used in immunostaining are much less stringent than the buffers employed for immunoprecipitation in the eCLIP protocol. Thus, background staining is not a an informative reference to assess specificity of CLIP isolations. In previous experiments, we confirmed very low background with the anti-HA antibodies in our eCLIP protocol. In the present study, we used a “no-crosslinking control” where samples were not irradiated with UV light. This negative control is now included in Supplementary Figure 4.

      (2) The GO analysis performed to infer synapse-gene specific regulation would be more useful if the authors would discuss specific genes that are represented within these terms and have been shown to be associated with neuronal function.

      We have now noted several synapse-related genes identified in the text.

      (3) Some figures would benefit from larger size and higher resolution including Fig S1, S3.

      We had previously embedded Figures as png files in the text document. In the revised version we uploaded the figures in higher resolution as individual jpeg files. Moreover, we now split Figure S1 into two separate supplementary figures (new Fig.S2) which allowed for enlarging the size of panels. We further enlarged the panels of (former) Fig.S3 (now Fig.S4).

      (4) RBP genes in Figure 1A x-axis are all lowercase. This is not standard mouse gene nomenclature.

      We corrected this.

      (5) Typo in Figure S4F rightmost panel y-axis - 'Length' is misspelled.

      We corrected this.

      Reviewer #2:

      Minor points:

      - Shortly explain DESEQ2 (p4)

      We now added a brief note and corresponding reference in the main text of the manuscript.

      - Is RBM20 a shuttling protein? Any detection in the cytoplasm?

      Our immunostainings for the endogenous RBM20 in heart and olfactory bulb cells suggest that the vast majority of wild-type RBM20 is localized to the nucleus. Previous work on RBM20 disease mutants suggest that pathological forms can accumulate in the cytoplasm. However, with the sensitivity of our detection we did not obtain evidence for a significant cytoplasmic pool in neurons. This does not exclude the possibility that the protein is shuttling – but assessing this would require different types of experiments.

      Reviewer #3:

      (1) Figure 1C: It is shown that some of the RBM20 staining do not colocalize with PV. This observation requires further explanation and discussion to clarify the significance.

      As seen in the fluorescent in situ hybridizations as well as the RiboTRap purifications (Fig.S1C,D), we observe mRNA RBM20 expression not only in parvalbumin-positive interneurons but also somatostatin-positive cells of the neocortex. Accordingly, some RBM20-positive cells do not express parvalbumin. We now clarified this in the text.

      Additionally, in Figure S1C, the resolution of the image is low, making it difficult to conclusively determine whether RBM20 RNA is localized in the nucleus. A high-resolution image would be beneficial to address this ambiguity.

      The Rbm20 mRNA is localized in the nucleus and cytoplasm. We have now split Figure S1 into two separate figures to enlarge the panels for S1C and make this more visible. Moreover, we uploaded higher resolution figure files.

      (2) Figure 1E: The molecular weight of RBM20 is approximately 135 kDa, yet there is a band near 135 kDa in the KO heart. How do the authors determine that the 150 kDa band represents RBM20 rather than the 135 kDa band? The authors may consider increasing the sample size to confirm whether the smaller band consistently appears across all KO heart tissues.

      We appreciate that in this higher molecular weight range, the indicated weight markers may not be entirely accurate. We used a validated knock-out mouse line to identify the appropriate RBM20 protein band. As the 150kDa band was reproducibly lost in the knock-out tissue in the brain and the heart tissue whereas the fainter band of lower mobility remained we concluded that on our gel system RBM20 protein has an apparent molecular weight of 150 kDa. This is further supported by the fact that also the endogenously tagged RBM20 protein has a similar mobility.

      As suggested by the reviewer, we now re-ran Western blots from multiple wild-type and corresponding knock-out tissues. This further confirmed the migration of the protein and loss of the 150 kDa band in the mutant mice (new Figure 1E).

      (3) Figure 2A: A higher-resolution image is recommended. Prior studies on RBM20 mutation knock-in mice suggest that when RBM20 localizes to the cytoplasm, it promotes molecular condensate formation. This seems to be the case in Figure 2A; however, the low image quality makes it difficult to see these molecular condensates.

      Figure2A shows endogenous RBM20 (not the epitope-tagged protein in the knock-in mice). The vast majority of the protein is localized in the nucleus rather than the cytoplasm. We are a bit uncertain what “condensates” the reviewer refers to. In the heart, we indeed see accumulations of RBM20 in foci (as described previously in the literature). As judged by their location within the DAPI-positive area, these foci are in the nucleus. By contrast, in the olfactory bulb neurons (which express lower levels of RBM20) we do not see a comparable concentration in nuclear foci but rather broad and diffuse staining. This is consistent with the hypothesis that the nuclear foci depend on the expression of highly expressed target transcripts such as titin. To better visualize this, we now uploaded files with higher resolution for the revised manuscript.

      (4) Figure 4D: This figure is not cited in the main text and should be referenced appropriately.

      We corrected this.

      (5) Page 5: The sentence "Finally, introns bound by RBM20 were significantly longer than expected by chance as assed..." contains a typo. The word "assed" should be corrected to "assessed".

      We corrected this.

      (6) Functional data: The study would benefit from functional experiments to elucidate the physiological role of RBM20 in PV neurons. For instance, since RBM20 regulates calcium-handling genes in neurons, does its absence impair calcium signaling in PV neurons? Additionally, given that RBM20 is involved in synaptic regulation, could RBM20 KO disrupt synaptic function? While it may not be feasible to address all these questions, providing some functional data would greatly enhance the overall significance of the study.

      We completely agree with the reviewer that this would greatly advance the study and the lack of data on cellular functions is the most significant limitation of this work. We attempted to obtain insights into cellular function through the structural investigations (Fig.S5). We had obtained some data on a behavioral phenotype in the mice which indicates that knock-out in vGLUT2 neurons precipitates alterations in behavior. However, due to conditions in our animal facility (emissions from construction) we struggled to solidify/confirm this data. Thus, in the interest of sharing the existing data in a timely manner we felt that more elaborate functional studies on synaptic transmission or calcium imaging should better be performed in a separate effort.

    1. eLife Assessment

      This study presents a useful method based on flow cytometry to study partitioning noise during cell division. The evidence supporting the claims of the authors is incomplete, as the method neglects other sources of noise present in cells. With the theoretical part extended, this paper would be of interest to cell biologists and biophysicists working on asymmetric partitioning during cell division.